First, we load all of required packages in this project at once. This allowed the project partners to load and work on the same versions of the packages.
We have choosen ECC dataset because it is about healthcare in what we are more professionally interested.
Objective: Early childhood caries (ECC) is a potentially severe disease affecting children all over the world [1]. The available findings are mostly based on a logistic regressionmodel, but data mining could be used to extract moreinformation from the same data set. In the paper, authors implement association rule mining for interpretability. While interpretability of the model is important, we seek other methods for classification and clustering with better performance.
Secondly, we import the training, test and validation splits of ECC datasets.
#READ DATA
TRAIN = read.csv("./ECC_train.csv")
VALIDATION = read.csv("./ECC_validation.csv")
TEST = read.csv("./ECC_test.csv")
## 3. Classification Methods
options(knitr.kable.NA = '')
#summary of the dataset gives us the brief information.
kable(summary(TRAIN))
| CITY | CHILD_ETHNICITY | CHILD_AGE | CHILD_GENDER | CHILD_SERBIAN_LANGUAGE | MOTHER_AGE | MARITAL_STATUS | MOTHER_ETHNICITY | MOTHER_SERBIAN_LANGUAGE | NUMBER_OF_CHILDREN | BIRTH_ORDER | MOTHER_EDUCATION_LEVEL | MOTHER_EMPLOYMENT_STATUS | QUALITY_OF_HOUSING | HOUSING_CONDITIONS | HOUSEHOLD_MONTHLY_INCOME | BIRTH_WEIGHT | BREASTFEEDING | BREASTFEEDING_FREQUENCY | BREASTFEEDING_DURING_NIGHT | BOTTLE_FEEDING | INFANT_FORMULAS | ADDITIONAL_FOOD_SWEETENING | CHILD_FLUORIDE_SUPPLEMENTS | CHILD_FLUORIDE_TOOTHPASTE | CHILD_ORAL_HYGIENE | CHILD_TOOTH_BRUSHING | DIARRHEA_DURING_INFANCY | MEDICAL_SYRUPS | CHILD_FIRST_DENTIST_VISIT | SWEETS_DURING_PREGNANCY | FLUORIDE_SUPPLEMENTS_DURING_PREGNANCY | ORAL_HEALTH_DURING_PREGNANCY | MOTHER_HEALTH_AWARENESS | FATHER_HEALTH_AWARENESS | ECC | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| NOVI_SAD :79 | Min. :1.000 | Min. :1.00 | Min. :1.000 | Min. :1.000 | Min. :1.000 | Min. :1.000 | Min. : 1.00 | Min. :1.000 | Min. :1.0 | Min. :1.000 | Min. :1.000 | Min. :1.000 | Min. :1.000 | Min. :1.0 | Min. :1.000 | Min. :1.000 | Min. :1 | Min. : 1 | Min. : 1.00 | Min. :1.000 | Min. :1.000 | Min. :1.000 | Min. :1.000 | Min. :1.000 | Min. :1.000 | Min. :1.000 | Min. :1.0 | Min. :1.000 | Min. :1.000 | Min. :1.000 | Min. :1.000 | Min. :1.000 | Min. :1.000 | Min. :1.000 | Min. :1.000 | |
| BACKA_PALANKA:42 | 1st Qu.:1.000 | 1st Qu.:3.00 | 1st Qu.:1.000 | 1st Qu.:1.000 | 1st Qu.:2.000 | 1st Qu.:1.000 | 1st Qu.: 1.00 | 1st Qu.:1.000 | 1st Qu.:1.0 | 1st Qu.:1.000 | 1st Qu.:3.000 | 1st Qu.:2.000 | 1st Qu.:1.000 | 1st Qu.:1.0 | 1st Qu.:3.000 | 1st Qu.:2.000 | 1st Qu.:1 | 1st Qu.: 2 | 1st Qu.: 1.00 | 1st Qu.:2.000 | 1st Qu.:1.000 | 1st Qu.:2.000 | 1st Qu.:2.000 | 1st Qu.:1.000 | 1st Qu.:2.000 | 1st Qu.:2.000 | 1st Qu.:2.0 | 1st Qu.:2.000 | 1st Qu.:2.000 | 1st Qu.:2.000 | 1st Qu.:2.000 | 1st Qu.:1.000 | 1st Qu.:2.000 | 1st Qu.:2.000 | 1st Qu.:1.000 | |
| KISAC :29 | Median :1.000 | Median :3.00 | Median :1.000 | Median :1.000 | Median :3.000 | Median :1.000 | Median : 1.00 | Median :2.000 | Median :2.0 | Median :1.000 | Median :3.000 | Median :3.000 | Median :2.000 | Median :1.0 | Median :4.000 | Median :2.000 | Median :2 | Median : 2 | Median : 1.00 | Median :2.000 | Median :2.000 | Median :2.000 | Median :3.000 | Median :1.000 | Median :2.000 | Median :2.000 | Median :2.0 | Median :2.000 | Median :3.000 | Median :2.000 | Median :3.000 | Median :2.000 | Median :2.000 | Median :2.000 | Median :2.000 | |
| RUSKI_KRSTUR :23 | Mean :2.167 | Mean :3.13 | Mean :1.473 | Mean :1.134 | Mean :2.427 | Mean :1.238 | Mean : 22.91 | Mean :1.732 | Mean :1.9 | Mean :1.678 | Mean :3.008 | Mean :2.427 | Mean :1.849 | Mean :1.1 | Mean :3.347 | Mean :1.908 | Mean :2 | Mean :119 | Mean : 93.01 | Mean :2.427 | Mean :1.565 | Mean :2.297 | Mean :2.707 | Mean :1.397 | Mean :1.879 | Mean :2.146 | Mean :1.9 | Mean :2.423 | Mean :3.159 | Mean :1.854 | Mean :2.431 | Mean :1.799 | Mean :2.059 | Mean :1.874 | Mean :1.703 | |
| TITEL :22 | 3rd Qu.:3.000 | 3rd Qu.:4.00 | 3rd Qu.:2.000 | 3rd Qu.:1.000 | 3rd Qu.:3.000 | 3rd Qu.:1.000 | 3rd Qu.: 3.00 | 3rd Qu.:2.000 | 3rd Qu.:2.0 | 3rd Qu.:2.000 | 3rd Qu.:3.000 | 3rd Qu.:3.000 | 3rd Qu.:3.000 | 3rd Qu.:1.0 | 3rd Qu.:4.000 | 3rd Qu.:2.000 | 3rd Qu.:3 | 3rd Qu.: 3 | 3rd Qu.: 1.00 | 3rd Qu.:3.000 | 3rd Qu.:2.000 | 3rd Qu.:3.000 | 3rd Qu.:3.000 | 3rd Qu.:2.000 | 3rd Qu.:2.000 | 3rd Qu.:3.000 | 3rd Qu.:2.0 | 3rd Qu.:3.000 | 3rd Qu.:4.000 | 3rd Qu.:2.000 | 3rd Qu.:3.000 | 3rd Qu.:2.000 | 3rd Qu.:2.000 | 3rd Qu.:2.000 | 3rd Qu.:2.000 | |
| TEMERIN :17 | Max. :7.000 | Max. :5.00 | Max. :2.000 | Max. :2.000 | Max. :3.000 | Max. :3.000 | Max. :999.00 | Max. :2.000 | Max. :3.0 | Max. :3.000 | Max. :4.000 | Max. :4.000 | Max. :3.000 | Max. :2.0 | Max. :5.000 | Max. :2.000 | Max. :4 | Max. :999 | Max. :999.00 | Max. :4.000 | Max. :2.000 | Max. :3.000 | Max. :3.000 | Max. :3.000 | Max. :3.000 | Max. :4.000 | Max. :2.0 | Max. :3.000 | Max. :4.000 | Max. :3.000 | Max. :3.000 | Max. :3.000 | Max. :3.000 | Max. :3.000 | Max. :2.000 | |
| (Other) :27 |
It can be seen that all the attributes except the “CITY” have numerical distribution.
Three attributes have thair maximum values as ‘999’. This value is meaningless and gives the ‘NA’ attribute. These ‘999’ values may correspond to problems and should be considered as missing data and be replaced.
Most of the data corresponds to ordinal data and should be considered as so. They will be converted with ‘ordered()’ function.
for (col in 2:ncol(TRAIN)) {
hist(TRAIN[,col], main = paste("Histogram of", colnames(TRAIN)[col]))
}
All the distributions of attributes are observed and the problems with the attributes having ‘999’ values are observed.
Most of the attributes have ordinal characteristics and very few classes. This situation can be observed from the histograms.
for (col in 2:ncol(TRAIN)) {
qqnorm(TRAIN[,col], main = paste("Normal QQ Plot of ",colnames(TRAIN)[col])); qqline(TRAIN[,col])
}
The Q-Q plots give a strong idea about the closeness of an attribute to the normal one. If the data is normally distributed, the points in the QQ-normal plot lie on a straight diagonal line.
As most of our data is type of nominal, it is not expected to have a normal data distribution in attributes. But still, It is possible to observe the dsitribution of nominal data labels in these Q-Q plots.
geomean = matrix(0,36,1)
for (col in 2:ncol(TRAIN)) {
geomean[col] = exp(mean(log(TRAIN[,col])))
}
#geomean
geomean_vector <- data.frame(geomean)
row.names(geomean_vector) <- colnames(TRAIN)
kable(geomean_vector,row.names = TRUE)
| geomean | |
|---|---|
| CITY | 0.000000 |
| CHILD_ETHNICITY | 1.758148 |
| CHILD_AGE | 3.018245 |
| CHILD_GENDER | 1.387803 |
| CHILD_SERBIAN_LANGUAGE | 1.097249 |
| MOTHER_AGE | 2.285236 |
| MARITAL_STATUS | 1.158650 |
| MOTHER_ETHNICITY | 1.887356 |
| MOTHER_SERBIAN_LANGUAGE | 1.661191 |
| NUMBER_OF_CHILDREN | 1.741823 |
| BIRTH_ORDER | 1.513556 |
| MOTHER_EDUCATION_LEVEL | 2.908949 |
| MOTHER_EMPLOYMENT_STATUS | 2.217976 |
| QUALITY_OF_HOUSING | 1.632377 |
| HOUSING_CONDITIONS | 1.072084 |
| HOUSEHOLD_MONTHLY_INCOME | 3.108337 |
| BIRTH_WEIGHT | 1.876377 |
| BREASTFEEDING | 1.754348 |
| BREASTFEEDING_FREQUENCY | 4.413374 |
| BREASTFEEDING_DURING_NIGHT | 2.084179 |
| BOTTLE_FEEDING | 2.240267 |
| INFANT_FORMULAS | 1.479237 |
| ADDITIONAL_FOOD_SWEETENING | 2.165548 |
| CHILD_FLUORIDE_SUPPLEMENTS | 2.638544 |
| CHILD_FLUORIDE_TOOTHPASTE | 1.272027 |
| CHILD_ORAL_HYGIENE | 1.799259 |
| CHILD_TOOTH_BRUSHING | 1.968312 |
| DIARRHEA_DURING_INFANCY | 1.865525 |
| MEDICAL_SYRUPS | 2.368098 |
| CHILD_FIRST_DENTIST_VISIT | 2.985174 |
| SWEETS_DURING_PREGNANCY | 1.783182 |
| FLUORIDE_SUPPLEMENTS_DURING_PREGNANCY | 2.259006 |
| ORAL_HEALTH_DURING_PREGNANCY | 1.650330 |
| MOTHER_HEALTH_AWARENESS | 1.992148 |
| FATHER_HEALTH_AWARENESS | 1.783284 |
| ECC | 1.627806 |
Besides the central tendency, the fact that how closely the data fall about the center is another issue. We need to figure out the spread pattern around the center.
rangeVector = matrix(0,36,1)
for (col in 2:ncol(TRAIN)) {
rangeVector[col] = max(TRAIN[,col], na.rm = TRUE)-min(TRAIN[,col], na.rm = TRUE)
}
range_Vector <- data.frame(rangeVector)
row.names(range_Vector) <- colnames(TRAIN)
kable(range_Vector,row.names = TRUE)
| rangeVector | |
|---|---|
| CITY | 0 |
| CHILD_ETHNICITY | 6 |
| CHILD_AGE | 4 |
| CHILD_GENDER | 1 |
| CHILD_SERBIAN_LANGUAGE | 1 |
| MOTHER_AGE | 2 |
| MARITAL_STATUS | 2 |
| MOTHER_ETHNICITY | 998 |
| MOTHER_SERBIAN_LANGUAGE | 1 |
| NUMBER_OF_CHILDREN | 2 |
| BIRTH_ORDER | 2 |
| MOTHER_EDUCATION_LEVEL | 3 |
| MOTHER_EMPLOYMENT_STATUS | 3 |
| QUALITY_OF_HOUSING | 2 |
| HOUSING_CONDITIONS | 1 |
| HOUSEHOLD_MONTHLY_INCOME | 4 |
| BIRTH_WEIGHT | 1 |
| BREASTFEEDING | 3 |
| BREASTFEEDING_FREQUENCY | 998 |
| BREASTFEEDING_DURING_NIGHT | 998 |
| BOTTLE_FEEDING | 3 |
| INFANT_FORMULAS | 1 |
| ADDITIONAL_FOOD_SWEETENING | 2 |
| CHILD_FLUORIDE_SUPPLEMENTS | 2 |
| CHILD_FLUORIDE_TOOTHPASTE | 2 |
| CHILD_ORAL_HYGIENE | 2 |
| CHILD_TOOTH_BRUSHING | 3 |
| DIARRHEA_DURING_INFANCY | 1 |
| MEDICAL_SYRUPS | 2 |
| CHILD_FIRST_DENTIST_VISIT | 3 |
| SWEETS_DURING_PREGNANCY | 2 |
| FLUORIDE_SUPPLEMENTS_DURING_PREGNANCY | 2 |
| ORAL_HEALTH_DURING_PREGNANCY | 2 |
| MOTHER_HEALTH_AWARENESS | 2 |
| FATHER_HEALTH_AWARENESS | 2 |
| ECC | 1 |
iqc = matrix(0,36,1)
for (col in 2:ncol(TRAIN)) {
iqc[col] = IQR(TRAIN[,col])
}
iqr_vector <- data.frame(iqc)
row.names(iqr_vector) <- colnames(TRAIN)
kable(iqr_vector, row.names = TRUE)
| iqc | |
|---|---|
| CITY | 0 |
| CHILD_ETHNICITY | 2 |
| CHILD_AGE | 1 |
| CHILD_GENDER | 1 |
| CHILD_SERBIAN_LANGUAGE | 0 |
| MOTHER_AGE | 1 |
| MARITAL_STATUS | 0 |
| MOTHER_ETHNICITY | 2 |
| MOTHER_SERBIAN_LANGUAGE | 1 |
| NUMBER_OF_CHILDREN | 1 |
| BIRTH_ORDER | 1 |
| MOTHER_EDUCATION_LEVEL | 0 |
| MOTHER_EMPLOYMENT_STATUS | 1 |
| QUALITY_OF_HOUSING | 2 |
| HOUSING_CONDITIONS | 0 |
| HOUSEHOLD_MONTHLY_INCOME | 1 |
| BIRTH_WEIGHT | 0 |
| BREASTFEEDING | 2 |
| BREASTFEEDING_FREQUENCY | 1 |
| BREASTFEEDING_DURING_NIGHT | 0 |
| BOTTLE_FEEDING | 1 |
| INFANT_FORMULAS | 1 |
| ADDITIONAL_FOOD_SWEETENING | 1 |
| CHILD_FLUORIDE_SUPPLEMENTS | 1 |
| CHILD_FLUORIDE_TOOTHPASTE | 1 |
| CHILD_ORAL_HYGIENE | 0 |
| CHILD_TOOTH_BRUSHING | 1 |
| DIARRHEA_DURING_INFANCY | 0 |
| MEDICAL_SYRUPS | 1 |
| CHILD_FIRST_DENTIST_VISIT | 2 |
| SWEETS_DURING_PREGNANCY | 0 |
| FLUORIDE_SUPPLEMENTS_DURING_PREGNANCY | 1 |
| ORAL_HEALTH_DURING_PREGNANCY | 1 |
| MOTHER_HEALTH_AWARENESS | 0 |
| FATHER_HEALTH_AWARENESS | 0 |
| ECC | 1 |
variance = matrix(0,36,1)
for (col in 2:ncol(TRAIN)) {
variance[col] = var(TRAIN[,col])
}
var_vector <- data.frame(variance)
row.names(var_vector) <- colnames(TRAIN)
kable(var_vector, row.names = TRUE)
| variance | |
|---|---|
| CITY | 0.000000e+00 |
| CHILD_ETHNICITY | 2.198762e+00 |
| CHILD_AGE | 6.091558e-01 |
| CHILD_GENDER | 2.503077e-01 |
| CHILD_SERBIAN_LANGUAGE | 1.164516e-01 |
| MOTHER_AGE | 5.229774e-01 |
| MARITAL_STATUS | 3.084280e-01 |
| MOTHER_ETHNICITY | 2.044553e+04 |
| MOTHER_SERBIAN_LANGUAGE | 1.968988e-01 |
| NUMBER_OF_CHILDREN | 5.697057e-01 |
| BIRTH_ORDER | 6.058507e-01 |
| MOTHER_EDUCATION_LEVEL | 4.537112e-01 |
| MOTHER_EMPLOYMENT_STATUS | 7.414648e-01 |
| QUALITY_OF_HOUSING | 8.175521e-01 |
| HOUSING_CONDITIONS | 9.071410e-02 |
| HOUSEHOLD_MONTHLY_INCOME | 1.194016e+00 |
| BIRTH_WEIGHT | 8.392810e-02 |
| BREASTFEEDING | 1.058823e+00 |
| BREASTFEEDING_FREQUENCY | 1.032018e+05 |
| BREASTFEEDING_DURING_NIGHT | 8.356663e+04 |
| BOTTLE_FEEDING | 8.254984e-01 |
| INFANT_FORMULAS | 2.468268e-01 |
| ADDITIONAL_FOOD_SWEETENING | 4.954116e-01 |
| CHILD_FLUORIDE_SUPPLEMENTS | 2.752013e-01 |
| CHILD_FLUORIDE_TOOTHPASTE | 4.841954e-01 |
| CHILD_ORAL_HYGIENE | 2.583243e-01 |
| CHILD_TOOTH_BRUSHING | 7.557751e-01 |
| DIARRHEA_DURING_INFANCY | 9.071410e-02 |
| MEDICAL_SYRUPS | 2.618403e-01 |
| CHILD_FIRST_DENTIST_VISIT | 8.653704e-01 |
| SWEETS_DURING_PREGNANCY | 2.179600e-01 |
| FLUORIDE_SUPPLEMENTS_DURING_PREGNANCY | 6.160121e-01 |
| ORAL_HEALTH_DURING_PREGNANCY | 5.309237e-01 |
| MOTHER_HEALTH_AWARENESS | 2.486551e-01 |
| FATHER_HEALTH_AWARENESS | 3.035055e-01 |
| ECC | 2.096973e-01 |
CV = matrix(0,36,1)
for (col in 2:ncol(TRAIN)) {
CV[col] = sd(TRAIN[,col], na.rm=TRUE)/mean(TRAIN[,col], na.rm=TRUE)*100
}
CV_vector <- data.frame(CV)
row.names(CV_vector) <- colnames(TRAIN)
kable(CV_vector, row.names = TRUE)
| CV | |
|---|---|
| CITY | 0.00000 |
| CHILD_ETHNICITY | 68.41594 |
| CHILD_AGE | 24.93794 |
| CHILD_GENDER | 33.96975 |
| CHILD_SERBIAN_LANGUAGE | 30.09548 |
| MOTHER_AGE | 29.79966 |
| MARITAL_STATUS | 44.84180 |
| MOTHER_ETHNICITY | 624.07043 |
| MOTHER_SERBIAN_LANGUAGE | 25.61646 |
| NUMBER_OF_CHILDREN | 39.73446 |
| BIRTH_ORDER | 46.39128 |
| MOTHER_EDUCATION_LEVEL | 22.39024 |
| MOTHER_EMPLOYMENT_STATUS | 35.48258 |
| QUALITY_OF_HOUSING | 48.89150 |
| HOUSING_CONDITIONS | 27.37030 |
| HOUSEHOLD_MONTHLY_INCOME | 32.64472 |
| BIRTH_WEIGHT | 15.18402 |
| BREASTFEEDING | 51.44958 |
| BREASTFEEDING_FREQUENCY | 270.01530 |
| BREASTFEEDING_DURING_NIGHT | 310.80960 |
| BOTTLE_FEEDING | 37.43933 |
| INFANT_FORMULAS | 31.74844 |
| ADDITIONAL_FOOD_SWEETENING | 30.64140 |
| CHILD_FLUORIDE_SUPPLEMENTS | 19.37844 |
| CHILD_FLUORIDE_TOOTHPASTE | 49.79225 |
| CHILD_ORAL_HYGIENE | 27.05417 |
| CHILD_TOOTH_BRUSHING | 40.50203 |
| DIARRHEA_DURING_INFANCY | 15.85548 |
| MEDICAL_SYRUPS | 21.12212 |
| CHILD_FIRST_DENTIST_VISIT | 29.44774 |
| SWEETS_DURING_PREGNANCY | 25.18735 |
| FLUORIDE_SUPPLEMENTS_DURING_PREGNANCY | 32.28616 |
| ORAL_HEALTH_DURING_PREGNANCY | 40.49911 |
| MOTHER_HEALTH_AWARENESS | 24.22320 |
| FATHER_HEALTH_AWARENESS | 29.39024 |
| ECC | 26.89056 |
options(knitr.kable.NA = '')
NUM=data.frame(TRAIN[2:36])
# correlations/covariance
kable(cov(NUM))
| CHILD_ETHNICITY | CHILD_AGE | CHILD_GENDER | CHILD_SERBIAN_LANGUAGE | MOTHER_AGE | MARITAL_STATUS | MOTHER_ETHNICITY | MOTHER_SERBIAN_LANGUAGE | NUMBER_OF_CHILDREN | BIRTH_ORDER | MOTHER_EDUCATION_LEVEL | MOTHER_EMPLOYMENT_STATUS | QUALITY_OF_HOUSING | HOUSING_CONDITIONS | HOUSEHOLD_MONTHLY_INCOME | BIRTH_WEIGHT | BREASTFEEDING | BREASTFEEDING_FREQUENCY | BREASTFEEDING_DURING_NIGHT | BOTTLE_FEEDING | INFANT_FORMULAS | ADDITIONAL_FOOD_SWEETENING | CHILD_FLUORIDE_SUPPLEMENTS | CHILD_FLUORIDE_TOOTHPASTE | CHILD_ORAL_HYGIENE | CHILD_TOOTH_BRUSHING | DIARRHEA_DURING_INFANCY | MEDICAL_SYRUPS | CHILD_FIRST_DENTIST_VISIT | SWEETS_DURING_PREGNANCY | FLUORIDE_SUPPLEMENTS_DURING_PREGNANCY | ORAL_HEALTH_DURING_PREGNANCY | MOTHER_HEALTH_AWARENESS | FATHER_HEALTH_AWARENESS | ECC | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| CHILD_ETHNICITY | 2.1987624 | -0.2024718 | 0.0339826 | 0.1581695 | -0.2397947 | 0.0481523 | 52.6828346 | -0.1776836 | 0.2899863 | 0.3062480 | -0.5056081 | -0.4834921 | 0.1135509 | 0.1806019 | -0.7306354 | -0.0727647 | 0.2857143 | 4.749161e+01 | 1.3431314 | 0.0333146 | -0.0403115 | 0.0467107 | 0.0702331 | 0.2147076 | 0.1758553 | 0.2232868 | -0.1511902 | 0.0760346 | 0.2295805 | -0.0888330 | 0.0830315 | 0.3362751 | -0.2241307 | -0.3318449 | -0.1265427 |
| CHILD_AGE | -0.2024718 | 0.6091558 | 0.0266517 | -0.0300447 | 0.1040751 | 0.0109525 | 9.6080834 | 0.0138708 | -0.0079287 | -0.0336662 | 0.0703386 | 0.0788650 | -0.0434056 | -0.0256848 | 0.1480433 | 0.0119897 | -0.0672269 | -1.105976e+01 | -7.7784009 | -0.0303787 | 0.0440737 | -0.0134841 | -0.0416828 | -0.0391688 | -0.0178088 | -0.1661334 | 0.0130797 | -0.0340354 | -0.1005415 | 0.0190746 | -0.0309237 | -0.0452692 | 0.0427903 | 0.0415597 | 0.0344925 |
| CHILD_GENDER | 0.0339826 | 0.0266517 | 0.2503077 | -0.0131500 | -0.0639745 | 0.0086143 | -5.6851728 | -0.0325235 | 0.0056608 | -0.0151014 | -0.0207799 | -0.0051510 | 0.0421047 | 0.0111459 | 0.0073837 | -0.0025140 | -0.0336134 | -1.356791e+01 | -14.2560740 | 0.0368658 | 0.0007208 | 0.0060124 | -0.0164024 | 0.0087550 | -0.0012130 | -0.0485215 | 0.0056608 | -0.0199712 | -0.0208678 | 0.0233114 | 0.0558876 | -0.0096867 | -0.0110052 | 0.0091769 | 0.0444077 |
| CHILD_SERBIAN_LANGUAGE | 0.1581695 | -0.0300447 | -0.0131500 | 0.1164516 | -0.0447769 | 0.0057487 | -2.6184382 | -0.0228192 | 0.0555184 | 0.0559228 | -0.0725537 | -0.0615836 | 0.0160508 | 0.0159101 | -0.0887100 | -0.0128336 | 0.0000000 | -3.126877e+00 | -3.9717134 | 0.0014416 | -0.0087198 | 0.0188812 | 0.0099680 | 0.0137829 | 0.0247178 | 0.0223269 | -0.0243135 | -0.0148026 | 0.0626560 | 0.0154882 | 0.0008790 | 0.0354066 | -0.0246827 | -0.0167364 | -0.0272846 |
| MOTHER_AGE | -0.2397947 | 0.1040751 | -0.0639745 | -0.0447769 | 0.5229774 | -0.0517914 | 3.3317746 | 0.0601420 | 0.0304314 | 0.0708484 | 0.1728842 | 0.1742379 | -0.1245209 | -0.0724482 | 0.2713336 | 0.0562568 | -0.0378151 | 1.662213e-01 | 6.7569178 | -0.0064344 | -0.0067860 | 0.0533561 | -0.0467459 | -0.0905207 | -0.0782497 | -0.0585598 | 0.0514398 | 0.0121655 | -0.0723427 | 0.0333497 | -0.0880595 | -0.1114061 | 0.0967441 | 0.0790057 | 0.0096691 |
| MARITAL_STATUS | 0.0481523 | 0.0109525 | 0.0086143 | 0.0057487 | -0.0517914 | 0.3084280 | 3.4160015 | -0.0535143 | -0.0305721 | -0.0194789 | -0.0482226 | -0.0349847 | 0.0276713 | 0.0347737 | -0.0789705 | 0.0052389 | 0.0294118 | 1.392198e+01 | 7.3551387 | -0.0223797 | -0.0050280 | -0.0081221 | 0.0113217 | 0.0434584 | 0.0710770 | 0.0951795 | -0.0221687 | -0.0675961 | 0.0291481 | -0.0111459 | 0.0564502 | 0.0018811 | -0.0644492 | -0.0329630 | -0.0212897 |
| MOTHER_ETHNICITY | 52.6828346 | 9.6080834 | -5.6851728 | -2.6184382 | 3.3317746 | 3.4160015 | 20445.5258606 | 1.2410780 | 2.3230723 | 2.7782954 | -0.6589255 | -1.0085616 | -0.9502655 | -1.9113076 | 4.5684575 | 1.8406174 | 4.4117647 | 1.753670e+03 | -1927.9782532 | -0.5337717 | -3.4669667 | 2.2320945 | 1.9909637 | 0.2367533 | -1.4686896 | -2.8484231 | -2.2147428 | -0.3912837 | 1.0518442 | -5.4078795 | -4.8317218 | 0.2974052 | 2.7656728 | -10.2085546 | -6.4421785 |
| MOTHER_SERBIAN_LANGUAGE | -0.1776836 | 0.0138708 | -0.0325235 | -0.0228192 | 0.0601420 | -0.0535143 | 1.2410780 | 0.1968988 | -0.0732218 | -0.0446187 | 0.0694772 | 0.0937555 | -0.0363032 | -0.0402236 | 0.1101930 | 0.0172638 | -0.0084034 | 6.266358e+00 | 3.7417461 | -0.0196899 | 0.0216413 | -0.0251573 | -0.0241377 | -0.0611793 | -0.0536374 | -0.0866707 | 0.0234169 | -0.0040083 | -0.0412784 | 0.0068387 | -0.0311698 | -0.0414015 | 0.0409620 | 0.0166661 | -0.0000527 |
| NUMBER_OF_CHILDREN | 0.2899863 | -0.0079287 | 0.0056608 | 0.0555184 | 0.0304314 | -0.0305721 | 2.3230723 | -0.0732218 | 0.5697057 | 0.4759151 | -0.1504167 | -0.2174677 | 0.0604409 | 0.0689498 | -0.2885095 | -0.0260891 | -0.0420168 | -9.124380e+00 | -11.7260469 | 0.0304314 | 0.0065399 | -0.0036567 | 0.0166837 | 0.0064695 | 0.0591927 | 0.0483809 | -0.0521430 | 0.0005977 | 0.1714954 | -0.0525825 | -0.0069618 | 0.0973946 | -0.1159418 | -0.0798847 | -0.0383601 |
| BIRTH_ORDER | 0.3062480 | -0.0336662 | -0.0151014 | 0.0559228 | 0.0708484 | -0.0194789 | 2.7782954 | -0.0446187 | 0.4759151 | 0.6058507 | -0.1527548 | -0.2064625 | 0.0437045 | 0.0661018 | -0.2657959 | 0.0038325 | -0.0042017 | -4.066946e+00 | -8.0309061 | 0.0162266 | 0.0020745 | 0.0204810 | 0.0102844 | 0.0025491 | 0.0783904 | 0.0893956 | -0.0408917 | -0.0313456 | 0.1186667 | -0.0347737 | -0.0244366 | 0.1156957 | -0.1070989 | -0.0826272 | -0.0666995 |
| MOTHER_EDUCATION_LEVEL | -0.5056081 | 0.0703386 | -0.0207799 | -0.0725537 | 0.1728842 | -0.0482226 | -0.6589255 | 0.0694772 | -0.1504167 | -0.1527548 | 0.4537112 | 0.2989346 | -0.1247846 | -0.1016842 | 0.4634682 | 0.0427903 | -0.0546218 | -9.432562e+00 | -0.7563728 | 0.0342288 | 0.0120601 | 0.0563271 | -0.0521606 | -0.1083823 | -0.0998207 | -0.1188777 | 0.0890791 | -0.0287613 | -0.1399916 | 0.0642558 | -0.0750501 | -0.1705812 | 0.1759783 | 0.1607187 | 0.0823283 |
| MOTHER_EMPLOYMENT_STATUS | -0.4834921 | 0.0788650 | -0.0051510 | -0.0615836 | 0.1742379 | -0.0349847 | -1.0085616 | 0.0937555 | -0.2174677 | -0.2064625 | 0.2989346 | 0.7414648 | -0.1287226 | -0.1102634 | 0.5192328 | 0.0646602 | 0.0126050 | 8.527566e+00 | 15.1098590 | 0.0439858 | 0.0268275 | 0.0953729 | -0.0593509 | -0.1031258 | -0.0740480 | -0.0837699 | 0.0766499 | 0.0037622 | -0.1521747 | 0.0375514 | -0.0418410 | -0.1366162 | 0.1765761 | 0.1630393 | 0.0852994 |
| QUALITY_OF_HOUSING | 0.1135509 | -0.0434056 | 0.0421047 | 0.0160508 | -0.1245209 | 0.0276713 | -0.9502655 | -0.0363032 | 0.0604409 | 0.0437045 | -0.1247846 | -0.1287226 | 0.8175521 | 0.0488028 | -0.2037727 | -0.0265286 | 0.0546218 | -7.444974e+00 | -11.2382300 | -0.0488907 | -0.0111986 | 0.0239267 | 0.0691431 | 0.0349144 | 0.0110580 | 0.0431595 | -0.0277944 | 0.0303084 | 0.0198481 | 0.0324707 | 0.1282128 | 0.0914701 | -0.0793748 | -0.0778102 | -0.0365318 |
| HOUSING_CONDITIONS | 0.1806019 | -0.0256848 | 0.0111459 | 0.0159101 | -0.0724482 | 0.0347737 | -1.9113076 | -0.0402236 | 0.0689498 | 0.0661018 | -0.1016842 | -0.1102634 | 0.0488028 | 0.0907141 | -0.1358602 | -0.0159277 | 0.0420168 | 8.008509e-01 | -5.0638691 | 0.0073837 | -0.0065399 | 0.0120601 | 0.0253331 | 0.0649590 | 0.0626560 | 0.0692662 | -0.0444956 | 0.0036040 | 0.0427903 | -0.0272494 | 0.0531803 | 0.0916810 | -0.0773355 | -0.0671741 | -0.0120601 |
| HOUSEHOLD_MONTHLY_INCOME | -0.7306354 | 0.1480433 | 0.0073837 | -0.0887100 | 0.2713336 | -0.0789705 | 4.5684575 | 0.1101930 | -0.2885095 | -0.2657959 | 0.4634682 | 0.5192328 | -0.2037727 | -0.1358602 | 1.1940157 | 0.0993284 | -0.0588235 | -2.405427e+01 | -6.8810696 | 0.0738546 | 0.0509124 | 0.1400970 | -0.0323125 | -0.1596287 | -0.0879364 | -0.1309026 | 0.1442636 | -0.0381316 | -0.2445238 | 0.0846841 | -0.0830667 | -0.2072712 | 0.2484793 | 0.2118421 | 0.0951971 |
| BIRTH_WEIGHT | -0.0727647 | 0.0119897 | -0.0025140 | -0.0128336 | 0.0562568 | 0.0052389 | 1.8406174 | 0.0172638 | -0.0260891 | 0.0038325 | 0.0427903 | 0.0646602 | -0.0265286 | -0.0159277 | 0.0993284 | 0.0839281 | 0.0000000 | -1.783833e+00 | 0.1058155 | 0.0058366 | 0.0059949 | 0.0022503 | 0.0065399 | -0.0094758 | -0.0112162 | 0.0177385 | 0.0201294 | -0.0113568 | 0.0020921 | 0.0158750 | -0.0063816 | -0.0311698 | 0.0180198 | 0.0178088 | 0.0271615 |
| BREASTFEEDING | 0.2857143 | -0.0672269 | -0.0336134 | 0.0000000 | -0.0378151 | 0.0294118 | 4.4117647 | -0.0084034 | -0.0420168 | -0.0042017 | -0.0546218 | 0.0126050 | 0.0546218 | 0.0420168 | -0.0588235 | 0.0000000 | 1.0588235 | 2.220378e+02 | 184.4873950 | -0.1722689 | 0.0210084 | 0.0672269 | 0.0294118 | 0.0714286 | 0.0000000 | 0.1470588 | -0.0420168 | -0.0042017 | -0.0126050 | -0.0126050 | 0.0714286 | 0.1176471 | -0.0588235 | -0.0630252 | -0.0252101 |
| BREASTFEEDING_FREQUENCY | 47.4916142 | -11.0597553 | -13.5679125 | -3.1268767 | 0.1662213 | 13.9219788 | 1753.6700538 | 6.2663584 | -9.1243803 | -4.0669456 | -9.4325621 | 8.5275658 | -7.4449738 | 0.8008509 | -24.0542702 | -1.7838332 | 222.0378151 | 1.032018e+05 | 81188.4077740 | -158.8968039 | -41.0781970 | 2.8436236 | -7.5073837 | 20.4049787 | -15.0492775 | 24.7810028 | -0.8092542 | 0.7039309 | 27.4577898 | -12.1717591 | -4.3966984 | 11.0369537 | -15.2968426 | -2.0703913 | -2.9024472 |
| BREASTFEEDING_DURING_NIGHT | 1.3431314 | -7.7784009 | -14.2560740 | -3.9717134 | 6.7569178 | 7.3551387 | -1927.9782532 | 3.7417461 | -11.7260469 | -8.0309061 | -0.7563728 | 15.1098590 | -11.2382300 | -5.0638691 | -6.8810696 | 0.1058155 | 184.4873950 | 8.118841e+04 | 83566.6301818 | -127.4657712 | -31.1560072 | 1.9344784 | -2.3336732 | 9.5008614 | -18.1670476 | 20.0365845 | 0.8831968 | 7.1224992 | 10.4734538 | 0.9171970 | -6.2053022 | -2.4352871 | -9.5845259 | 3.2027355 | -1.9302767 |
| BOTTLE_FEEDING | 0.0333146 | -0.0303787 | 0.0368658 | 0.0014416 | -0.0064344 | -0.0223797 | -0.5337717 | -0.0196899 | 0.0304314 | 0.0162266 | 0.0342288 | 0.0439858 | -0.0488907 | 0.0073837 | 0.0738546 | 0.0058366 | -0.1722689 | -1.588968e+02 | -127.4657712 | 0.8254984 | 0.1780880 | 0.0575578 | 0.0162793 | -0.0485039 | 0.0646074 | -0.0249464 | 0.0136247 | -0.0256496 | -0.0135192 | 0.0165430 | 0.0169825 | -0.0105657 | 0.0085088 | -0.0302380 | -0.0365493 |
| INFANT_FORMULAS | -0.0403115 | 0.0440737 | 0.0007208 | -0.0087198 | -0.0067860 | -0.0050280 | -3.4669667 | 0.0216413 | 0.0065399 | 0.0020745 | 0.0120601 | 0.0268275 | -0.0111986 | -0.0065399 | 0.0509124 | 0.0059949 | 0.0210084 | -4.107820e+01 | -31.1560072 | 0.1780880 | 0.2468268 | -0.0088429 | 0.0484863 | -0.0069794 | 0.0184065 | 0.0051686 | 0.0233466 | -0.0338244 | 0.0022503 | 0.0074364 | 0.0412609 | -0.0121304 | -0.0164200 | -0.0170353 | 0.0130445 |
| ADDITIONAL_FOOD_SWEETENING | 0.0467107 | -0.0134841 | 0.0060124 | 0.0188812 | 0.0533561 | -0.0081221 | 2.2320945 | -0.0251573 | -0.0036567 | 0.0204810 | 0.0563271 | 0.0953729 | 0.0239267 | 0.0120601 | 0.1400970 | 0.0022503 | 0.0672269 | 2.843624e+00 | 1.9344784 | 0.0575578 | -0.0088429 | 0.4954116 | 0.0159453 | -0.0219402 | 0.0025843 | -0.0142752 | 0.0173517 | -0.0252277 | -0.0642382 | 0.0310819 | -0.0025140 | -0.0451285 | 0.0665588 | 0.0332443 | 0.0340002 |
| CHILD_FLUORIDE_SUPPLEMENTS | 0.0702331 | -0.0416828 | -0.0164024 | 0.0099680 | -0.0467459 | 0.0113217 | 1.9909637 | -0.0241377 | 0.0166837 | 0.0102844 | -0.0521606 | -0.0593509 | 0.0691431 | 0.0253331 | -0.0323125 | 0.0065399 | 0.0294118 | -7.507384e+00 | -2.3336732 | 0.0162793 | 0.0484863 | 0.0159453 | 0.2752013 | 0.0622868 | 0.0399423 | 0.0052565 | -0.0043247 | 0.0150487 | 0.1223937 | -0.0262649 | 0.1015435 | 0.0585774 | -0.0878134 | -0.0369185 | -0.0075419 |
| CHILD_FLUORIDE_TOOTHPASTE | 0.2147076 | -0.0391688 | 0.0087550 | 0.0137829 | -0.0905207 | 0.0434584 | 0.2367533 | -0.0611793 | 0.0064695 | 0.0025491 | -0.1083823 | -0.1031258 | 0.0349144 | 0.0649590 | -0.1596287 | -0.0094758 | 0.0714286 | 2.040498e+01 | 9.5008614 | -0.0485039 | -0.0069794 | -0.0219402 | 0.0622868 | 0.4841954 | 0.0316269 | 0.1054112 | -0.0355473 | 0.0119897 | 0.0457790 | -0.0255793 | 0.0926831 | 0.0801660 | -0.0948103 | -0.0633417 | -0.0116733 |
| CHILD_ORAL_HYGIENE | 0.1758553 | -0.0178088 | -0.0012130 | 0.0247178 | -0.0782497 | 0.0710770 | -1.4686896 | -0.0536374 | 0.0591927 | 0.0783904 | -0.0998207 | -0.0740480 | 0.0110580 | 0.0626560 | -0.0879364 | -0.0112162 | 0.0000000 | -1.504928e+01 | -18.1670476 | 0.0646074 | 0.0184065 | 0.0025843 | 0.0399423 | 0.0316269 | 0.2583243 | 0.1817095 | -0.0374459 | -0.0157343 | 0.0823986 | -0.0388524 | 0.0735206 | 0.1057804 | -0.0895011 | -0.0573116 | -0.0319961 |
| CHILD_TOOTH_BRUSHING | 0.2232868 | -0.1661334 | -0.0485215 | 0.0223269 | -0.0585598 | 0.0951795 | -2.8484231 | -0.0866707 | 0.0483809 | 0.0893956 | -0.1188777 | -0.0837699 | 0.0431595 | 0.0692662 | -0.1309026 | 0.0177385 | 0.1470588 | 2.478100e+01 | 20.0365845 | -0.0249464 | 0.0051686 | -0.0142752 | 0.0052565 | 0.1054112 | 0.1817095 | 0.7557751 | -0.0608628 | 0.0428958 | 0.1698956 | -0.0288844 | 0.0710770 | 0.0841567 | -0.0800429 | -0.0655743 | -0.0319433 |
| DIARRHEA_DURING_INFANCY | -0.1511902 | 0.0130797 | 0.0056608 | -0.0243135 | 0.0514398 | -0.0221687 | -2.2147428 | 0.0234169 | -0.0521430 | -0.0408917 | 0.0890791 | 0.0766499 | -0.0277944 | -0.0444956 | 0.1442636 | 0.0201294 | -0.0420168 | -8.092542e-01 | 0.8831968 | 0.0136247 | 0.0233466 | 0.0173517 | -0.0043247 | -0.0355473 | -0.0374459 | -0.0608628 | 0.0907141 | -0.0204107 | -0.0680004 | 0.0188460 | -0.0321719 | -0.0832777 | 0.0479238 | 0.0461657 | 0.0162617 |
| MEDICAL_SYRUPS | 0.0760346 | -0.0340354 | -0.0199712 | -0.0148026 | 0.0121655 | -0.0675961 | -0.3912837 | -0.0040083 | 0.0005977 | -0.0313456 | -0.0287613 | 0.0037622 | 0.0303084 | 0.0036040 | -0.0381316 | -0.0113568 | -0.0042017 | 7.039309e-01 | 7.1224992 | -0.0256496 | -0.0338244 | -0.0252277 | 0.0150487 | 0.0119897 | -0.0157343 | 0.0428958 | -0.0204107 | 0.2618403 | -0.0044478 | -0.0176857 | -0.0022151 | 0.0138005 | -0.0038501 | -0.0517738 | -0.0335959 |
| CHILD_FIRST_DENTIST_VISIT | 0.2295805 | -0.1005415 | -0.0208678 | 0.0626560 | -0.0723427 | 0.0291481 | 1.0518442 | -0.0412784 | 0.1714954 | 0.1186667 | -0.1399916 | -0.1521747 | 0.0198481 | 0.0427903 | -0.2445238 | 0.0020921 | -0.0126050 | 2.745779e+01 | 10.4734538 | -0.0135192 | 0.0022503 | -0.0642382 | 0.1223937 | 0.0457790 | 0.0823986 | 0.1698956 | -0.0680004 | -0.0044478 | 0.8653704 | -0.0690552 | 0.0110228 | 0.0992933 | -0.1185964 | -0.0850005 | 0.0096164 |
| SWEETS_DURING_PREGNANCY | -0.0888330 | 0.0190746 | 0.0233114 | 0.0154882 | 0.0333497 | -0.0111459 | -5.4078795 | 0.0068387 | -0.0525825 | -0.0347737 | 0.0642558 | 0.0375514 | 0.0324707 | -0.0272494 | 0.0846841 | 0.0158750 | -0.0126050 | -1.217176e+01 | 0.9171970 | 0.0165430 | 0.0074364 | 0.0310819 | -0.0262649 | -0.0255793 | -0.0388524 | -0.0288844 | 0.0188460 | -0.0176857 | -0.0690552 | 0.2179600 | -0.0248585 | -0.0337365 | 0.0548328 | 0.0403643 | 0.0067332 |
| FLUORIDE_SUPPLEMENTS_DURING_PREGNANCY | 0.0830315 | -0.0309237 | 0.0558876 | 0.0008790 | -0.0880595 | 0.0564502 | -4.8317218 | -0.0311698 | -0.0069618 | -0.0244366 | -0.0750501 | -0.0418410 | 0.1282128 | 0.0531803 | -0.0830667 | -0.0063816 | 0.0714286 | -4.396698e+00 | -6.2053022 | 0.0169825 | 0.0412609 | -0.0025140 | 0.1015435 | 0.0926831 | 0.0735206 | 0.0710770 | -0.0321719 | -0.0022151 | 0.0110228 | -0.0248585 | 0.6160121 | 0.0617067 | -0.1429978 | -0.0801308 | -0.0058894 |
| ORAL_HEALTH_DURING_PREGNANCY | 0.3362751 | -0.0452692 | -0.0096867 | 0.0354066 | -0.1114061 | 0.0018811 | 0.2974052 | -0.0414015 | 0.0973946 | 0.1156957 | -0.1705812 | -0.1366162 | 0.0914701 | 0.0916810 | -0.2072712 | -0.0311698 | 0.1176471 | 1.103695e+01 | -2.4352871 | -0.0105657 | -0.0121304 | -0.0451285 | 0.0585774 | 0.0801660 | 0.1057804 | 0.0841567 | -0.0832777 | 0.0138005 | 0.0992933 | -0.0337365 | 0.0617067 | 0.5309237 | -0.1268415 | -0.1219542 | -0.0431068 |
| MOTHER_HEALTH_AWARENESS | -0.2241307 | 0.0427903 | -0.0110052 | -0.0246827 | 0.0967441 | -0.0644492 | 2.7656728 | 0.0409620 | -0.1159418 | -0.1070989 | 0.1759783 | 0.1765761 | -0.0793748 | -0.0773355 | 0.2484793 | 0.0180198 | -0.0588235 | -1.529684e+01 | -9.5845259 | 0.0085088 | -0.0164200 | 0.0665588 | -0.0878134 | -0.0948103 | -0.0895011 | -0.0800429 | 0.0479238 | -0.0038501 | -0.1185964 | 0.0548328 | -0.1429978 | -0.1268415 | 0.2486551 | 0.1208291 | 0.0468865 |
| FATHER_HEALTH_AWARENESS | -0.3318449 | 0.0415597 | 0.0091769 | -0.0167364 | 0.0790057 | -0.0329630 | -10.2085546 | 0.0166661 | -0.0798847 | -0.0826272 | 0.1607187 | 0.1630393 | -0.0778102 | -0.0671741 | 0.2118421 | 0.0178088 | -0.0630252 | -2.070391e+00 | 3.2027355 | -0.0302380 | -0.0170353 | 0.0332443 | -0.0369185 | -0.0633417 | -0.0573116 | -0.0655743 | 0.0461657 | -0.0517738 | -0.0850005 | 0.0403643 | -0.0801308 | -0.1219542 | 0.1208291 | 0.3035055 | 0.0591927 |
| ECC | -0.1265427 | 0.0344925 | 0.0444077 | -0.0272846 | 0.0096691 | -0.0212897 | -6.4421785 | -0.0000527 | -0.0383601 | -0.0666995 | 0.0823283 | 0.0852994 | -0.0365318 | -0.0120601 | 0.0951971 | 0.0271615 | -0.0252101 | -2.902447e+00 | -1.9302767 | -0.0365493 | 0.0130445 | 0.0340002 | -0.0075419 | -0.0116733 | -0.0319961 | -0.0319433 | 0.0162617 | -0.0335959 | 0.0096164 | 0.0067332 | -0.0058894 | -0.0431068 | 0.0468865 | 0.0591927 | 0.2096973 |
kable(cor(NUM))
| CHILD_ETHNICITY | CHILD_AGE | CHILD_GENDER | CHILD_SERBIAN_LANGUAGE | MOTHER_AGE | MARITAL_STATUS | MOTHER_ETHNICITY | MOTHER_SERBIAN_LANGUAGE | NUMBER_OF_CHILDREN | BIRTH_ORDER | MOTHER_EDUCATION_LEVEL | MOTHER_EMPLOYMENT_STATUS | QUALITY_OF_HOUSING | HOUSING_CONDITIONS | HOUSEHOLD_MONTHLY_INCOME | BIRTH_WEIGHT | BREASTFEEDING | BREASTFEEDING_FREQUENCY | BREASTFEEDING_DURING_NIGHT | BOTTLE_FEEDING | INFANT_FORMULAS | ADDITIONAL_FOOD_SWEETENING | CHILD_FLUORIDE_SUPPLEMENTS | CHILD_FLUORIDE_TOOTHPASTE | CHILD_ORAL_HYGIENE | CHILD_TOOTH_BRUSHING | DIARRHEA_DURING_INFANCY | MEDICAL_SYRUPS | CHILD_FIRST_DENTIST_VISIT | SWEETS_DURING_PREGNANCY | FLUORIDE_SUPPLEMENTS_DURING_PREGNANCY | ORAL_HEALTH_DURING_PREGNANCY | MOTHER_HEALTH_AWARENESS | FATHER_HEALTH_AWARENESS | ECC | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| CHILD_ETHNICITY | 1.0000000 | -0.1749489 | 0.0458069 | 0.3125799 | -0.2236191 | 0.0584724 | 0.2484739 | -0.2700453 | 0.2590974 | 0.2653392 | -0.5062151 | -0.3786649 | 0.0846922 | 0.4043858 | -0.4509273 | -0.1693861 | 0.1872540 | 0.0996975 | 0.0031334 | 0.0247279 | -0.0547197 | 0.0447553 | 0.0902875 | 0.2080885 | 0.2333370 | 0.1732119 | -0.3385299 | 0.1002083 | 0.1664351 | -0.1283208 | 0.0713443 | 0.3112358 | -0.3031192 | -0.4062213 | -0.1863595 |
| CHILD_AGE | -0.1749489 | 1.0000000 | 0.0682532 | -0.1128055 | 0.1843916 | 0.0252681 | 0.0860941 | 0.0400513 | -0.0134590 | -0.0554175 | 0.1337950 | 0.1173478 | -0.0615070 | -0.1092632 | 0.1735880 | 0.0530263 | -0.0837080 | -0.0441101 | -0.0344754 | -0.0428397 | 0.1136630 | -0.0245456 | -0.1018046 | -0.0721217 | -0.0448939 | -0.2448479 | 0.0556412 | -0.0852213 | -0.1384778 | 0.0523483 | -0.0504815 | -0.0796017 | 0.1099469 | 0.0966552 | 0.0965081 |
| CHILD_GENDER | 0.0458069 | 0.0682532 | 1.0000000 | -0.0770224 | -0.1768189 | 0.0310033 | -0.0794708 | -0.1465002 | 0.0149906 | -0.0387792 | -0.0616617 | -0.0119567 | 0.0930756 | 0.0739673 | 0.0135062 | -0.0173448 | -0.0652926 | -0.0844175 | -0.0985704 | 0.0811014 | 0.0028999 | 0.0170738 | -0.0624949 | 0.0251482 | -0.0047704 | -0.1115580 | 0.0375670 | -0.0780096 | -0.0448371 | 0.0998029 | 0.1423259 | -0.0265720 | -0.0441127 | 0.0332947 | 0.1938318 |
| CHILD_SERBIAN_LANGUAGE | 0.3125799 | -0.1128055 | -0.0770224 | 1.0000000 | -0.1814429 | 0.0303336 | -0.0536624 | -0.1506973 | 0.2155456 | 0.2105393 | -0.3156437 | -0.2095788 | 0.0520194 | 0.1547974 | -0.2379001 | -0.1298140 | 0.0000000 | -0.0285229 | -0.0402614 | 0.0046495 | -0.0514325 | 0.0786092 | 0.0556814 | 0.0580441 | 0.1425132 | 0.0752592 | -0.2365577 | -0.0847708 | 0.1973736 | 0.0972165 | 0.0032819 | 0.1423954 | -0.1450510 | -0.0890238 | -0.1746014 |
| MOTHER_AGE | -0.2236191 | 0.1843916 | -0.1768189 | -0.1814429 | 1.0000000 | -0.1289554 | 0.0322207 | 0.1874197 | 0.0557514 | 0.1258653 | 0.3549148 | 0.2798053 | -0.1904334 | -0.3326204 | 0.3433659 | 0.2685220 | -0.0508174 | 0.0007155 | 0.0323214 | -0.0097928 | -0.0188875 | 0.1048237 | -0.1232187 | -0.1798855 | -0.2128917 | -0.0931455 | 0.2361677 | 0.0328754 | -0.1075357 | 0.0987785 | -0.1551458 | -0.2114226 | 0.2682777 | 0.1983049 | 0.0291978 |
| MARITAL_STATUS | 0.0584724 | 0.0252681 | 0.0310033 | 0.0303336 | -0.1289554 | 1.0000000 | 0.0430172 | -0.2171557 | -0.0729327 | -0.0450615 | -0.1289093 | -0.0731570 | 0.0551055 | 0.2078917 | -0.1301317 | 0.0325620 | 0.0514674 | 0.0780334 | 0.0458139 | -0.0443525 | -0.0182229 | -0.0207782 | 0.0388605 | 0.1124570 | 0.2518079 | 0.1971379 | -0.1325336 | -0.2378627 | 0.0564198 | -0.0429882 | 0.1295072 | 0.0046485 | -0.2327245 | -0.1077373 | -0.0837136 |
| MOTHER_ETHNICITY | 0.2484739 | 0.0860941 | -0.0794708 | -0.0536624 | 0.0322207 | 0.0430172 | 1.0000000 | 0.0195604 | 0.0215248 | 0.0249630 | -0.0068414 | -0.0081914 | -0.0073500 | -0.0443807 | 0.0292392 | 0.0444335 | 0.0299848 | 0.0381773 | -0.0466430 | -0.0041086 | -0.0488039 | 0.0221784 | 0.0265423 | 0.0023795 | -0.0202092 | -0.0229144 | -0.0514265 | -0.0053478 | 0.0079077 | -0.0810102 | -0.0430535 | 0.0028545 | 0.0387885 | -0.1295931 | -0.0983869 |
| MOTHER_SERBIAN_LANGUAGE | -0.2700453 | 0.0400513 | -0.1465002 | -0.1506973 | 0.1874197 | -0.2171557 | 0.0195604 | 1.0000000 | -0.2186217 | -0.1291851 | 0.2324506 | 0.2453748 | -0.0904828 | -0.3009693 | 0.2272624 | 0.1342954 | -0.0184043 | 0.0439592 | 0.0291700 | -0.0488386 | 0.0981670 | -0.0805490 | -0.1036929 | -0.1981402 | -0.2378281 | -0.2246747 | 0.1752146 | -0.0176531 | -0.1000001 | 0.0330115 | -0.0894989 | -0.1280497 | 0.1851232 | 0.0681755 | -0.0002596 |
| NUMBER_OF_CHILDREN | 0.2590974 | -0.0134590 | 0.0149906 | 0.2155456 | 0.0557514 | -0.0729327 | 0.0215248 | -0.2186217 | 1.0000000 | 0.8100678 | -0.2958563 | -0.3345987 | 0.0885621 | 0.3032983 | -0.3498081 | -0.1193109 | -0.0540986 | -0.0376300 | -0.0537415 | 0.0443750 | 0.0174400 | -0.0068830 | 0.0421348 | 0.0123179 | 0.1542980 | 0.0737313 | -0.2293684 | 0.0015476 | 0.2442452 | -0.1492203 | -0.0117517 | 0.1770898 | -0.3080463 | -0.1921122 | -0.1109834 |
| BIRTH_ORDER | 0.2653392 | -0.0554175 | -0.0387792 | 0.2105393 | 0.1258653 | -0.0450615 | 0.0249630 | -0.1291851 | 0.8100678 | 1.0000000 | -0.2913549 | -0.3080443 | 0.0620992 | 0.2819634 | -0.3125075 | 0.0169959 | -0.0052460 | -0.0162645 | -0.0356915 | 0.0229449 | 0.0053645 | 0.0373840 | 0.0251868 | 0.0047065 | 0.1981514 | 0.1321104 | -0.1744274 | -0.0787001 | 0.1638872 | -0.0956930 | -0.0400002 | 0.2039944 | -0.2759329 | -0.1926890 | -0.1871299 |
| MOTHER_EDUCATION_LEVEL | -0.5062151 | 0.1337950 | -0.0616617 | -0.3156437 | 0.3549148 | -0.1289093 | -0.0068414 | 0.2324506 | -0.2958563 | -0.2913549 | 1.0000000 | 0.5153962 | -0.2048867 | -0.5012175 | 0.6296877 | 0.2192816 | -0.0788070 | -0.0435909 | -0.0038845 | 0.0559298 | 0.0360382 | 0.1188078 | -0.1476141 | -0.2312374 | -0.2915736 | -0.2030085 | 0.4390853 | -0.0834450 | -0.2234144 | 0.2043311 | -0.1419603 | -0.3475565 | 0.5239270 | 0.4331051 | 0.2669090 |
| MOTHER_EMPLOYMENT_STATUS | -0.3786649 | 0.1173478 | -0.0119567 | -0.2095788 | 0.2798053 | -0.0731570 | -0.0081914 | 0.2453748 | -0.3345987 | -0.3080443 | 0.5153962 | 1.0000000 | -0.1653301 | -0.4251562 | 0.5518383 | 0.2592017 | 0.0142261 | 0.0308273 | 0.0607014 | 0.0562224 | 0.0627102 | 0.1573608 | -0.1313884 | -0.1721122 | -0.1691943 | -0.1119042 | 0.2955486 | 0.0085384 | -0.1899748 | 0.0934099 | -0.0619102 | -0.2177413 | 0.4112329 | 0.3436875 | 0.2163238 |
| QUALITY_OF_HOUSING | 0.0846922 | -0.0615070 | 0.0930756 | 0.0520194 | -0.1904334 | 0.0551055 | -0.0073500 | -0.0904828 | 0.0885621 | 0.0620992 | -0.2048867 | -0.1653301 | 1.0000000 | 0.1792047 | -0.2062449 | -0.1012751 | 0.0587079 | -0.0256308 | -0.0429956 | -0.0595128 | -0.0249293 | 0.0375961 | 0.1457693 | 0.0554928 | 0.0240622 | 0.0549064 | -0.1020615 | 0.0655068 | 0.0235972 | 0.0769212 | 0.1806671 | 0.1388370 | -0.1760461 | -0.1562052 | -0.0882301 |
| HOUSING_CONDITIONS | 0.4043858 | -0.1092632 | 0.0739673 | 0.1547974 | -0.3326204 | 0.2078917 | -0.0443807 | -0.3009693 | 0.3032983 | 0.2819634 | -0.5012175 | -0.4251562 | 0.1792047 | 1.0000000 | -0.4128096 | -0.1825417 | 0.1355732 | 0.0082770 | -0.0581606 | 0.0269823 | -0.0437053 | 0.0568891 | 0.1603343 | 0.3099502 | 0.4093010 | 0.2645377 | -0.4905039 | 0.0233842 | 0.1527240 | -0.1937899 | 0.2249668 | 0.4177592 | -0.5149238 | -0.4048382 | -0.0874411 |
| HOUSEHOLD_MONTHLY_INCOME | -0.4509273 | 0.1735880 | 0.0135062 | -0.2379001 | 0.3433659 | -0.1301317 | 0.0292392 | 0.2272624 | -0.3498081 | -0.3125075 | 0.6296877 | 0.5518383 | -0.2062449 | -0.4128096 | 1.0000000 | 0.3137724 | -0.0523160 | -0.0685241 | -0.0217838 | 0.0743900 | 0.0937827 | 0.1821549 | -0.0563690 | -0.2099402 | -0.1583366 | -0.1377993 | 0.4383431 | -0.0681964 | -0.2405553 | 0.1660001 | -0.0968562 | -0.2603262 | 0.4560228 | 0.3519037 | 0.1902489 |
| BIRTH_WEIGHT | -0.1693861 | 0.0530263 | -0.0173448 | -0.1298140 | 0.2685220 | 0.0325620 | 0.0444335 | 0.1342954 | -0.1193109 | 0.0169959 | 0.2192816 | 0.2592017 | -0.1012751 | -0.1825417 | 0.3137724 | 1.0000000 | 0.0000000 | -0.0191671 | 0.0012635 | 0.0221744 | 0.0416514 | 0.0110357 | 0.0430318 | -0.0470056 | -0.0761745 | 0.0704314 | 0.2306957 | -0.0766100 | 0.0077628 | 0.1173737 | -0.0280662 | -0.1476604 | 0.1247374 | 0.1115829 | 0.2047404 |
| BREASTFEEDING | 0.1872540 | -0.0837080 | -0.0652926 | 0.0000000 | -0.0508174 | 0.0514674 | 0.0299848 | -0.0184043 | -0.0540986 | -0.0052460 | -0.0788070 | 0.0142261 | 0.0587079 | 0.1355732 | -0.0523160 | 0.0000000 | 1.0000000 | 0.6716940 | 0.6202096 | -0.1842625 | 0.0410946 | 0.0928214 | 0.0544859 | 0.0997585 | 0.0000000 | 0.1643929 | -0.1355732 | -0.0079798 | -0.0131684 | -0.0262388 | 0.0884434 | 0.1569109 | -0.1146412 | -0.1111781 | -0.0535015 |
| BREASTFEEDING_FREQUENCY | 0.0996975 | -0.0441101 | -0.0844175 | -0.0285229 | 0.0007155 | 0.0780334 | 0.0381773 | 0.0439592 | -0.0376300 | -0.0162645 | -0.0435909 | 0.0308273 | -0.0256308 | 0.0082770 | -0.0685241 | -0.0191671 | 0.6716940 | 1.0000000 | 0.8742465 | -0.5443940 | -0.2573781 | 0.0125761 | -0.0445471 | 0.0912814 | -0.0921700 | 0.0887317 | -0.0083638 | 0.0042822 | 0.0918800 | -0.0811561 | -0.0174377 | 0.0471508 | -0.0954903 | -0.0116984 | -0.0197299 |
| BREASTFEEDING_DURING_NIGHT | 0.0031334 | -0.0344754 | -0.0985704 | -0.0402614 | 0.0323214 | 0.0458139 | -0.0466430 | 0.0291700 | -0.0537415 | -0.0356915 | -0.0038845 | 0.0607014 | -0.0429956 | -0.0581606 | -0.0217838 | 0.0012635 | 0.6202096 | 0.8742465 | 1.0000000 | -0.4853097 | -0.2169348 | 0.0095075 | -0.0153886 | 0.0472320 | -0.1236475 | 0.0797280 | 0.0101439 | 0.0481502 | 0.0389469 | 0.0067961 | -0.0273497 | -0.0115616 | -0.0664899 | 0.0201104 | -0.0145817 |
| BOTTLE_FEEDING | 0.0247279 | -0.0428397 | 0.0811014 | 0.0046495 | -0.0097928 | -0.0443525 | -0.0041086 | -0.0488386 | 0.0443750 | 0.0229449 | 0.0559298 | 0.0562224 | -0.0595128 | 0.0269823 | 0.0743900 | 0.0221744 | -0.1842625 | -0.5443940 | -0.4853097 | 1.0000000 | 0.3945303 | 0.0900042 | 0.0341549 | -0.0767200 | 0.1399078 | -0.0315830 | 0.0497888 | -0.0551701 | -0.0159953 | 0.0390003 | 0.0238149 | -0.0159597 | 0.0187808 | -0.0604105 | -0.0878466 |
| INFANT_FORMULAS | -0.0547197 | 0.1136630 | 0.0028999 | -0.0514325 | -0.0188875 | -0.0182229 | -0.0488039 | 0.0981670 | 0.0174400 | 0.0053645 | 0.0360382 | 0.0627102 | -0.0249293 | -0.0437053 | 0.0937827 | 0.0416514 | 0.0410946 | -0.2573781 | -0.2169348 | 0.3945303 | 1.0000000 | -0.0252880 | 0.1860364 | -0.0201887 | 0.0728942 | 0.0119669 | 0.1560234 | -0.1330503 | 0.0048690 | 0.0320613 | 0.1058151 | -0.0335090 | -0.0662792 | -0.0622400 | 0.0573372 |
| ADDITIONAL_FOOD_SWEETENING | 0.0447553 | -0.0245456 | 0.0170738 | 0.0786092 | 0.1048237 | -0.0207782 | 0.0221784 | -0.0805490 | -0.0068830 | 0.0373840 | 0.1188078 | 0.1573608 | 0.0375961 | 0.0568891 | 0.1821549 | 0.0110357 | 0.0928214 | 0.0125761 | 0.0095075 | 0.0900042 | -0.0252880 | 1.0000000 | 0.0431841 | -0.0447967 | 0.0072240 | -0.0233293 | 0.0818506 | -0.0700448 | -0.0981092 | 0.0945880 | -0.0045508 | -0.0879938 | 0.1896374 | 0.0857335 | 0.1054878 |
| CHILD_FLUORIDE_SUPPLEMENTS | 0.0902875 | -0.1018046 | -0.0624949 | 0.0556814 | -0.1232187 | 0.0388605 | 0.0265423 | -0.1036929 | 0.0421348 | 0.0251868 | -0.1476141 | -0.1313884 | 0.1457693 | 0.1603343 | -0.0563690 | 0.0430318 | 0.0544859 | -0.0445471 | -0.0153886 | 0.0341549 | 0.1860364 | 0.0431841 | 1.0000000 | 0.1706321 | 0.1498048 | 0.0115259 | -0.0273714 | 0.0560603 | 0.2508031 | -0.1072413 | 0.2466224 | 0.1532459 | -0.3356887 | -0.1277426 | -0.0313950 |
| CHILD_FLUORIDE_TOOTHPASTE | 0.2080885 | -0.0721217 | 0.0251482 | 0.0580441 | -0.1798855 | 0.1124570 | 0.0023795 | -0.1981402 | 0.0123179 | 0.0047065 | -0.2312374 | -0.1721122 | 0.0554928 | 0.3099502 | -0.2099402 | -0.0470056 | 0.0997585 | 0.0912814 | 0.0472320 | -0.0767200 | -0.0201887 | -0.0447967 | 0.1706321 | 1.0000000 | 0.0894259 | 0.1742530 | -0.1696128 | 0.0336729 | 0.0707220 | -0.0787389 | 0.1697054 | 0.1581116 | -0.2732414 | -0.1652326 | -0.0366342 |
| CHILD_ORAL_HYGIENE | 0.2333370 | -0.0448939 | -0.0047704 | 0.1425132 | -0.2128917 | 0.2518079 | -0.0202092 | -0.2378281 | 0.1542980 | 0.1981514 | -0.2915736 | -0.1691943 | 0.0240622 | 0.4093010 | -0.1583366 | -0.0761745 | 0.0000000 | -0.0921700 | -0.1236475 | 0.1399078 | 0.0728942 | 0.0072240 | 0.1498048 | 0.0894259 | 1.0000000 | 0.4112432 | -0.2446159 | -0.0604989 | 0.1742756 | -0.1637369 | 0.1843028 | 0.2856318 | -0.3531400 | -0.2046807 | -0.1374730 |
| CHILD_TOOTH_BRUSHING | 0.1732119 | -0.2448479 | -0.1115580 | 0.0752592 | -0.0931455 | 0.1971379 | -0.0229144 | -0.2246747 | 0.0737313 | 0.1321104 | -0.2030085 | -0.1119042 | 0.0549064 | 0.2645377 | -0.1377993 | 0.0704314 | 0.1643929 | 0.0887317 | 0.0797280 | -0.0315830 | 0.0119669 | -0.0233293 | 0.0115259 | 0.1742530 | 0.4112432 | 1.0000000 | -0.2324441 | 0.0964274 | 0.2100800 | -0.0711669 | 0.1041689 | 0.1328545 | -0.1846409 | -0.1369161 | -0.0802393 |
| DIARRHEA_DURING_INFANCY | -0.3385299 | 0.0556412 | 0.0375670 | -0.2365577 | 0.2361677 | -0.1325336 | -0.0514265 | 0.1752146 | -0.2293684 | -0.1744274 | 0.4390853 | 0.2955486 | -0.1020615 | -0.4905039 | 0.4383431 | 0.2306957 | -0.1355732 | -0.0083638 | 0.0101439 | 0.0497888 | 0.1560234 | 0.0818506 | -0.0273714 | -0.1696128 | -0.2446159 | -0.2324441 | 1.0000000 | -0.1324347 | -0.2427019 | 0.1340276 | -0.1360956 | -0.3794679 | 0.3190912 | 0.2782269 | 0.1179052 |
| MEDICAL_SYRUPS | 0.1002083 | -0.0852213 | -0.0780096 | -0.0847708 | 0.0328754 | -0.2378627 | -0.0053478 | -0.0176531 | 0.0015476 | -0.0787001 | -0.0834450 | 0.0085384 | 0.0655068 | 0.0233842 | -0.0681964 | -0.0766100 | -0.0079798 | 0.0042822 | 0.0481502 | -0.0551701 | -0.1330503 | -0.0700448 | 0.0560603 | 0.0336729 | -0.0604989 | 0.0964274 | -0.1324347 | 1.0000000 | -0.0093439 | -0.0740315 | -0.0055155 | 0.0370135 | -0.0150887 | -0.1836576 | -0.1433743 |
| CHILD_FIRST_DENTIST_VISIT | 0.1664351 | -0.1384778 | -0.0448371 | 0.1973736 | -0.1075357 | 0.0564198 | 0.0079077 | -0.1000001 | 0.2442452 | 0.1638872 | -0.2234144 | -0.1899748 | 0.0235972 | 0.1527240 | -0.2405553 | 0.0077628 | -0.0131684 | 0.0918800 | 0.0389469 | -0.0159953 | 0.0048690 | -0.0981092 | 0.2508031 | 0.0707220 | 0.1742756 | 0.2100800 | -0.2427019 | -0.0093439 | 1.0000000 | -0.1590037 | 0.0150972 | 0.1464882 | -0.2556653 | -0.1658583 | 0.0225743 |
| SWEETS_DURING_PREGNANCY | -0.1283208 | 0.0523483 | 0.0998029 | 0.0972165 | 0.0987785 | -0.0429882 | -0.0810102 | 0.0330115 | -0.1492203 | -0.0956930 | 0.2043311 | 0.0934099 | 0.0769212 | -0.1937899 | 0.1660001 | 0.1173737 | -0.0262388 | -0.0811561 | 0.0067961 | 0.0390003 | 0.0320613 | 0.0945880 | -0.1072413 | -0.0787389 | -0.1637369 | -0.0711669 | 0.1340276 | -0.0740315 | -0.1590037 | 1.0000000 | -0.0678409 | -0.0991735 | 0.2355339 | 0.1569370 | 0.0314948 |
| FLUORIDE_SUPPLEMENTS_DURING_PREGNANCY | 0.0713443 | -0.0504815 | 0.1423259 | 0.0032819 | -0.1551458 | 0.1295072 | -0.0430535 | -0.0894989 | -0.0117517 | -0.0400002 | -0.1419603 | -0.0619102 | 0.1806671 | 0.2249668 | -0.0968562 | -0.0280662 | 0.0884434 | -0.0174377 | -0.0273497 | 0.0238149 | 0.1058151 | -0.0045508 | 0.2466224 | 0.1697054 | 0.1843028 | 0.1041689 | -0.1360956 | -0.0055155 | 0.0150972 | -0.0678409 | 1.0000000 | 0.1079000 | -0.3653726 | -0.1853197 | -0.0163862 |
| ORAL_HEALTH_DURING_PREGNANCY | 0.3112358 | -0.0796017 | -0.0265720 | 0.1423954 | -0.2114226 | 0.0046485 | 0.0028545 | -0.1280497 | 0.1770898 | 0.2039944 | -0.3475565 | -0.2177413 | 0.1388370 | 0.4177592 | -0.2603262 | -0.1476604 | 0.1569109 | 0.0471508 | -0.0115616 | -0.0159597 | -0.0335090 | -0.0879938 | 0.1532459 | 0.1581116 | 0.2856318 | 0.1328545 | -0.3794679 | 0.0370135 | 0.1464882 | -0.0991735 | 0.1079000 | 1.0000000 | -0.3490975 | -0.3038068 | -0.1291913 |
| MOTHER_HEALTH_AWARENESS | -0.3031192 | 0.1099469 | -0.0441127 | -0.1450510 | 0.2682777 | -0.2327245 | 0.0387885 | 0.1851232 | -0.3080463 | -0.2759329 | 0.5239270 | 0.4112329 | -0.1760461 | -0.5149238 | 0.4560228 | 0.1247374 | -0.1146412 | -0.0954903 | -0.0664899 | 0.0187808 | -0.0662792 | 0.1896374 | -0.3356887 | -0.2732414 | -0.3531400 | -0.1846409 | 0.3190912 | -0.0150887 | -0.2556653 | 0.2355339 | -0.3653726 | -0.3490975 | 1.0000000 | 0.4398347 | 0.2053303 |
| FATHER_HEALTH_AWARENESS | -0.4062213 | 0.0966552 | 0.0332947 | -0.0890238 | 0.1983049 | -0.1077373 | -0.1295931 | 0.0681755 | -0.1921122 | -0.1926890 | 0.4331051 | 0.3436875 | -0.1562052 | -0.4048382 | 0.3519037 | 0.1115829 | -0.1111781 | -0.0116984 | 0.0201104 | -0.0604105 | -0.0622400 | 0.0857335 | -0.1277426 | -0.1652326 | -0.2046807 | -0.1369161 | 0.2782269 | -0.1836576 | -0.1658583 | 0.1569370 | -0.1853197 | -0.3038068 | 0.4398347 | 1.0000000 | 0.2346327 |
| ECC | -0.1863595 | 0.0965081 | 0.1938318 | -0.1746014 | 0.0291978 | -0.0837136 | -0.0983869 | -0.0002596 | -0.1109834 | -0.1871299 | 0.2669090 | 0.2163238 | -0.0882301 | -0.0874411 | 0.1902489 | 0.2047404 | -0.0535015 | -0.0197299 | -0.0145817 | -0.0878466 | 0.0573372 | 0.1054878 | -0.0313950 | -0.0366342 | -0.1374730 | -0.0802393 | 0.1179052 | -0.1433743 | 0.0225743 | 0.0314948 | -0.0163862 | -0.1291913 | 0.2053303 | 0.2346327 | 1.0000000 |
To be able to have an idea about the outliers, we should plot boxplots of the numerical attributes.
for (col in 2:ncol(TRAIN)) {
boxplot(TRAIN[,col],main=paste("Boxplot of the",colnames(TRAIN)[col] ))
}
library(ade4)
library(data.table)
#COMBINE ALL DATA TO HAVE CONSISTENT
ALL_DATA <- rbind(TRAIN, VALIDATION, TEST)
ALL_DATA_x <- ALL_DATA[,1:35]
ALL_DATA_y <- ALL_DATA[36]
#APPLY ONE HOT METHOD TO CATEGORICAL AND NULL(999) INVOLVING FEATURES
col_names <- c("CITY", "CHILD_ETHNICITY", "MOTHER_ETHNICITY", "BREASTFEEDING_FREQUENCY", "BREASTFEEDING_DURING_NIGHT", "MOTHER_EMPLOYMENT_STATUS")
for (f in col_names){
df_all_dummy = acm.disjonctif(ALL_DATA_x[f])
ALL_DATA_x[f] = NULL
ALL_DATA_x = cbind(ALL_DATA_x, df_all_dummy)
}
#DELETE .999 FEATURES
col_names999 <- c("MOTHER_ETHNICITY.999", "BREASTFEEDING_FREQUENCY.999", "BREASTFEEDING_DURING_NIGHT.999")
for (f in col_names999){
ALL_DATA_x[f] = NULL
}
#NORMALIZATION FUNCTION
normalize <- function(x) {
return ((x - min(x)) / (max(x) - min(x)))
}
#APPLY NORMALIZATION
ALL_DATA_x <- as.data.frame(lapply(ALL_DATA_x, normalize))
For ordered data
ALL_DATA_x_o = ALL_DATA[,1:35]
factor_vars = c("CITY", "CHILD_ETHNICITY", "CHILD_GENDER", "MOTHER_SERBIAN_LANGUAGE",
"CHILD_SERBIAN_LANGUAGE", "MARITAL_STATUS", "MOTHER_ETHNICITY")
ordered_vars = c("CHILD_AGE", "MOTHER_AGE", "BIRTH_ORDER",
"MOTHER_EDUCATION_LEVEL", "MOTHER_EMPLOYMENT_STATUS", "QUALITY_OF_HOUSING",
"HOUSING_CONDITIONS", "HOUSEHOLD_MONTHLY_INCOME",
"BIRTH_WEIGHT", "BREASTFEEDING", "BREASTFEEDING_DURING_NIGHT",
"BOTTLE_FEEDING", "INFANT_FORMULAS", "ADDITIONAL_FOOD_SWEETENING",
"CHILD_FLUORIDE_SUPPLEMENTS", "CHILD_FLUORIDE_TOOTHPASTE", "CHILD_ORAL_HYGIENE",
"CHILD_TOOTH_BRUSHING", "DIARRHEA_DURING_INFANCY", "MEDICAL_SYRUPS",
"CHILD_FIRST_DENTIST_VISIT", "SWEETS_DURING_PREGNANCY",
"FLUORIDE_SUPPLEMENTS_DURING_PREGNANCY", "ORAL_HEALTH_DURING_PREGNANCY",
"MOTHER_HEALTH_AWARENESS", "FATHER_HEALTH_AWARENESS")
#ORDERED
for (var in ordered_vars) ALL_DATA_x_o[,var] = ordered(ALL_DATA_x_o[,var])
for (var in factor_vars) ALL_DATA_x_o[,var] = factor(ALL_DATA_x_o[,var])
#APPLY ONE HOT METHOD TO CATEGORICAL AND NULL(999) INVOLVING FEATURES
col_names <- c("CITY", "CHILD_ETHNICITY", "MOTHER_ETHNICITY", "BREASTFEEDING_FREQUENCY", "BREASTFEEDING_DURING_NIGHT", "MOTHER_EMPLOYMENT_STATUS")
for (f in col_names){
df_all_dummy = acm.disjonctif(ALL_DATA_x_o[f])
ALL_DATA_x_o[f] = NULL
ALL_DATA_x_o = cbind(ALL_DATA_x_o, df_all_dummy)
}
#DELETE .999 FEATURES
col_names999 <- c("MOTHER_ETHNICITY.999", "BREASTFEEDING_FREQUENCY.999", "BREASTFEEDING_DURING_NIGHT.999")
for (f in col_names999){
ALL_DATA_x_o[f] = NULL
}
col_names <- colnames(TRAIN)
TRAIN_factor <- as.data.frame(lapply(TRAIN[,col_names], factor))
rules1 <- apriori(TRAIN_factor, appearance = list(rhs=c("ECC=1"), default="lhs"), parameter = list(minlen=2, maxlen=7, sup = 0.1, conf = 0.4, target="rules"))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.4 0.1 1 none FALSE TRUE 5 0.1 2
## maxlen target ext
## 7 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 23
##
## set item appearances ...[1 item(s)] done [0.00s].
## set transactions ...[125 item(s), 239 transaction(s)] done [0.00s].
## sorting and recoding items ... [93 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 7
## Warning in apriori(TRAIN_factor, appearance = list(rhs = c("ECC=1"),
## default = "lhs"), : Mining stopped (maxlen reached). Only patterns up to a
## length of 7 returned!
## done [1.26s].
## writing ... [125 rule(s)] done [0.07s].
## creating S4 object ... done [0.11s].
rules1<-sort(rules1, decreasing=TRUE, by="confidence")
#inspect(rules1)
rules2 <- apriori(TRAIN_factor, appearance = list(rhs=c("ECC=2"), default="lhs"), parameter = list(minlen=2, maxlen=7, sup = 0.3, conf = 0.8, target="rules"))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.3 2
## maxlen target ext
## 7 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 71
##
## set item appearances ...[1 item(s)] done [0.00s].
## set transactions ...[125 item(s), 239 transaction(s)] done [0.00s].
## sorting and recoding items ... [47 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 7
## Warning in apriori(TRAIN_factor, appearance = list(rhs = c("ECC=2"),
## default = "lhs"), : Mining stopped (maxlen reached). Only patterns up to a
## length of 7 returned!
## done [0.03s].
## writing ... [246 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
rules2<-sort(rules2, decreasing=TRUE, by="confidence")
#inspect(rules2)
#SEPARATE TRAIN, VALIDATION AND TEST
TRAIN_conv_x <- ALL_DATA_x[1:239,]
VALIDATION_conv_x <- ALL_DATA_x[240:273,]
TEST_conv_x <- ALL_DATA_x[274:341,]
TRAIN_y <- TRAIN[,36]
TRAIN_y <- as.factor(TRAIN_y)
VALIDATION_y <- VALIDATION[,36]
VALIDATION_y <- as.factor(VALIDATION_y)
TEST_y <- TEST[,36]
TEST_y <- as.factor(TEST_y)
#POSSIBLE COST AND GAMMA VALUES
cost_try = c(0.1, 0.5, 1, 5, 10, 20, 50, 80, 100, 500)
gamma_try = c(0.005, 0.01, 0.02, 0.05, 0.1, 0.5, 1, 2, 5, 10)
#BEST COST AND GAMMA VALUES SELECTED ACCORDING TO ACCURACY
max_accur = 0
best_cost = 1
best_gamma = 1
for (i in 1:10)
{
for (j in 1:10)
{
svm_model <- svm(x = TRAIN_conv_x, y = TRAIN_y, gamma = gamma_try[j], cost = cost_try[i])
svm_res <- predict(svm_model, VALIDATION_conv_x)
conf_res <- confusionMatrix(svm_res, VALIDATION_y)
if (max_accur < conf_res$overall[1])
{
max_accur = conf_res$overall[1]
best_cost = cost_try[i]
best_gamma = gamma_try[j]
print(conf_res$overall[1])
}
}
}
## Accuracy
## 0.6764706
## Accuracy
## 0.7058824
## Accuracy
## 0.7647059
#BEST VALUES PRINTED
print(best_cost)
## [1] 5
print(best_gamma)
## [1] 0.01
#TEST DATASET IS PREDICTED AND RESULTS ARE DISPLAYED
svm_model <- svm(x = TRAIN_conv_x, y = TRAIN_y, gamma = best_gamma, cost = best_cost)
svm_res <- predict(svm_model, TEST_conv_x)
conf_res <- confusionMatrix(svm_res, TEST_y)
print(conf_res)
## Confusion Matrix and Statistics
##
## Reference
## Prediction 1 2
## 1 5 2
## 2 17 44
##
## Accuracy : 0.7206
## 95% CI : (0.5985, 0.8227)
## No Information Rate : 0.6765
## P-Value [Acc > NIR] : 0.261543
##
## Kappa : 0.2236
## Mcnemar's Test P-Value : 0.001319
##
## Sensitivity : 0.22727
## Specificity : 0.95652
## Pos Pred Value : 0.71429
## Neg Pred Value : 0.72131
## Prevalence : 0.32353
## Detection Rate : 0.07353
## Detection Prevalence : 0.10294
## Balanced Accuracy : 0.59190
##
## 'Positive' Class : 1
##
For the SVM model, Cost is how much we penalize the SVM for data points within the margin. If we decrease the cost, the error rate would increase where the margin gets larger. Gamma defines how far the influence of single training example reaches.
If the value of Gamma is high, then our decision boundary will depend on points close to the decision boundary and nearer points carry more weights than far away points due to which our decision boundary becomes more wiggly.
If the value of Gamma is low, then far away points carry more weights than nearer points and thus our decision boundary becomes more like a straight line.
The value of gamma and C should not be very high because it leads to the overfitting or it shouldn’t be very small (underfitting). Thus we need to choose the optimal value of C and Gamma in order to get a good fit. In our case, different costs and Gamma values were tried an adjusted for the best performance.
#SEPARATE TRAIN, VALIDATION AND TEST
TRAIN_conv_x <- ALL_DATA_x[1:239,]
VALIDATION_conv_x <- ALL_DATA_x[240:273,]
TEST_conv_x <- ALL_DATA_x[274:341,]
TRAIN_y <- TRAIN[,36]
TRAIN_y <- as.factor(TRAIN_y)
VALIDATION_y <- VALIDATION[,36]
VALIDATION_y <- as.factor(VALIDATION_y)
TEST_y <- TEST[,36]
TEST_y <- as.factor(TEST_y)
#BEST K VALUE IS SELECTED ACCORDING TO ACCURACY
max_accur = 0
best_k_val = 1
for (i in 1:100)
{
test_pred <- knn(train = TRAIN_conv_x, test = VALIDATION_conv_x, cl = TRAIN_y, k=i)
conf_res <- confusionMatrix(test_pred, VALIDATION_y)
if (max_accur < conf_res$overall[1])
{
max_accur = conf_res$overall[1]
best_k_val = i
print(conf_res$overall[1])
}
}
## Accuracy
## 0.7058824
#BEST VALUES PRINTED
print(best_k_val)
## [1] 1
#TEST DATASET IS PREDICTED AND RESULTS ARE DISPLAYED
test_pred <- knn(train = TRAIN_conv_x, test = TEST_conv_x, cl = TRAIN_y, k=best_k_val)
conf_res <- confusionMatrix(test_pred, TEST_y)
print(conf_res)
## Confusion Matrix and Statistics
##
## Reference
## Prediction 1 2
## 1 8 8
## 2 14 38
##
## Accuracy : 0.6765
## 95% CI : (0.5521, 0.7849)
## No Information Rate : 0.6765
## P-Value [Acc > NIR] : 0.5575
##
## Kappa : 0.2043
## Mcnemar's Test P-Value : 0.2864
##
## Sensitivity : 0.3636
## Specificity : 0.8261
## Pos Pred Value : 0.5000
## Neg Pred Value : 0.7308
## Prevalence : 0.3235
## Detection Rate : 0.1176
## Detection Prevalence : 0.2353
## Balanced Accuracy : 0.5949
##
## 'Positive' Class : 1
##
We also try KNN on the ordered dataset.
#SEPARATE TRAIN, VALIDATION AND TEST
TRAIN_conv_x <- ALL_DATA_x_o[1:239,]
VALIDATION_conv_x <- ALL_DATA_x_o[240:273,]
TEST_conv_x <- ALL_DATA_x_o[274:341,]
TRAIN_y <- TRAIN[,36]
TRAIN_y <- as.factor(TRAIN_y)
VALIDATION_y <- VALIDATION[,36]
VALIDATION_y <- as.factor(VALIDATION_y)
TEST_y <- TEST[,36]
TEST_y <- as.factor(TEST_y)
#BEST K VALUE IS SELECTED ACCORDING TO ACCURACY
max_accur = 0
best_k_val = 1
for (i in 1:100)
{
test_pred <- knn(train = TRAIN_conv_x, test = VALIDATION_conv_x, cl = TRAIN_y, k=i)
conf_res <- confusionMatrix(test_pred, VALIDATION_y)
if (max_accur < conf_res$overall[1])
{
max_accur = conf_res$overall[1]
best_k_val = i
print(conf_res$overall[1])
}
}
## Accuracy
## 0.5882353
## Accuracy
## 0.6176471
## Accuracy
## 0.6470588
## Accuracy
## 0.6764706
#BEST VALUES PRINTED
print(best_k_val)
## [1] 39
#TEST DATASET IS PREDICTED AND RESULTS ARE DISPLAYED
test_pred <- knn(train = TRAIN_conv_x, test = TEST_conv_x, cl = TRAIN_y, k=best_k_val)
conf_res <- confusionMatrix(test_pred, TEST_y)
print(conf_res)
## Confusion Matrix and Statistics
##
## Reference
## Prediction 1 2
## 1 0 0
## 2 22 46
##
## Accuracy : 0.6765
## 95% CI : (0.5521, 0.7849)
## No Information Rate : 0.6765
## P-Value [Acc > NIR] : 0.5575
##
## Kappa : 0
## Mcnemar's Test P-Value : 7.562e-06
##
## Sensitivity : 0.0000
## Specificity : 1.0000
## Pos Pred Value : NaN
## Neg Pred Value : 0.6765
## Prevalence : 0.3235
## Detection Rate : 0.0000
## Detection Prevalence : 0.0000
## Balanced Accuracy : 0.5000
##
## 'Positive' Class : 1
##
For the KNN model, the most and only important parameter is the ‘k value’. it looks through the training data and finds the k training examples that are closest to the new example. It then assigns the most common class label (among those k training examples) to the test example.
When the data is directly fed to the model, we observed that k=1 gives the best results within all k values. Normally, k=1 might show the appearance of overfitting. But in our case, it does not. As our class labels are nominal and have small number of types, 1-NN does not directly show overfitting.
Also, we tried this model for the ordered (nominal) dataset. The optimal k value is not 1 but equal to 39 in this case. But the accuracy result did not change surprisingly.
#SEPARATE TEST
TEST_conv_x <- ALL_DATA_x[274:341,]
TEST_y <- TEST[,36]
TEST_y <- as.factor(TEST_y)
#VALIDATION COMBINED WITH TRAIN
TV_conv_x <- ALL_DATA_x[1:273,]
TV_y <- c(TRAIN_y, VALIDATION_y)
TV_y <- as.factor(TV_y)
#BECAUSE OF NO PARAMETER SELECTION, NB APPLIED DIRECTLY
nb_model <- naiveBayes(x = TV_conv_x, y = TV_y, laplace = laplace)
nb_res <- predict(nb_model, TEST_conv_x)
conf_res <- confusionMatrix(nb_res, TEST_y)
print(conf_res)
## Confusion Matrix and Statistics
##
## Reference
## Prediction 1 2
## 1 15 28
## 2 7 18
##
## Accuracy : 0.4853
## 95% CI : (0.3622, 0.6097)
## No Information Rate : 0.6765
## P-Value [Acc > NIR] : 0.9996453
##
## Kappa : 0.0585
## Mcnemar's Test P-Value : 0.0007232
##
## Sensitivity : 0.6818
## Specificity : 0.3913
## Pos Pred Value : 0.3488
## Neg Pred Value : 0.7200
## Prevalence : 0.3235
## Detection Rate : 0.2206
## Detection Prevalence : 0.6324
## Balanced Accuracy : 0.5366
##
## 'Positive' Class : 1
##
#SEPARATE TRAIN, VALIDATION AND TEST
TRAIN_conv_x <- ALL_DATA_x[1:239,]
VALIDATION_conv_x <- ALL_DATA_x[240:273,]
TEST_conv_x <- ALL_DATA_x[274:341,]
TRAIN_y <- TRAIN[,36]
TRAIN_y <- as.factor(TRAIN_y)
VALIDATION_y <- VALIDATION[,36]
VALIDATION_y <- as.factor(VALIDATION_y)
TEST_y <- TEST[,36]
TEST_y <- as.factor(TEST_y)
#BEST NTREE VALUE IS SELECTED ACCORDING TO ACCURACY
max_accur = 0
res_num_of_tree = 0
num_of_tree = 16
for (i in 1:7)
{
set.seed(97)
rf_model <- randomForest(x = TRAIN_conv_x, y = TRAIN_y, ntree = num_of_tree)
rf_res <- predict(rf_model, VALIDATION_conv_x)
rf_res_round <- as.factor(round(as.numeric(rf_res)))
conf_res <- confusionMatrix(rf_res_round, VALIDATION_y)
if (conf_res$overall[1] > max_accur)
{
max_accur = conf_res$overall[1]
res_num_of_tree = num_of_tree
print(conf_res$overall[1])
}
num_of_tree = num_of_tree*2
}
## Accuracy
## 0.5882353
## Accuracy
## 0.6470588
## Accuracy
## 0.6764706
#BEST VALUES PRINTED
print(res_num_of_tree)
## [1] 64
#TEST DATASET IS PREDICTED AND RESULTS ARE DISPLAYED
set.seed(97)
rf_res_model <- randomForest(x = TRAIN_conv_x, y = TRAIN_y, ntree = res_num_of_tree)
rf_res <- predict(rf_model, TEST_conv_x)
rf_res_round <- as.factor(round(as.numeric(rf_res)))
conf_res <- confusionMatrix(rf_res_round, TEST_y)
print(conf_res)
## Confusion Matrix and Statistics
##
## Reference
## Prediction 1 2
## 1 4 2
## 2 18 44
##
## Accuracy : 0.7059
## 95% CI : (0.5829, 0.8102)
## No Information Rate : 0.6765
## P-Value [Acc > NIR] : 0.3537252
##
## Kappa : 0.1707
## Mcnemar's Test P-Value : 0.0007962
##
## Sensitivity : 0.18182
## Specificity : 0.95652
## Pos Pred Value : 0.66667
## Neg Pred Value : 0.70968
## Prevalence : 0.32353
## Detection Rate : 0.05882
## Detection Prevalence : 0.08824
## Balanced Accuracy : 0.56917
##
## 'Positive' Class : 1
##
The same model applied to the ordered dataset.
#SEPARATE TRAIN, VALIDATION AND TEST
TRAIN_conv_x <- ALL_DATA_x_o[1:239,]
VALIDATION_conv_x <- ALL_DATA_x_o[240:273,]
TEST_conv_x <- ALL_DATA_x_o[274:341,]
TRAIN_y <- TRAIN[,36]
TRAIN_y <- as.factor(TRAIN_y)
VALIDATION_y <- VALIDATION[,36]
VALIDATION_y <- as.factor(VALIDATION_y)
TEST_y <- TEST[,36]
TEST_y <- as.factor(TEST_y)
#BEST NTREE VALUE IS SELECTED ACCORDING TO ACCURACY
max_accur = 0
res_num_of_tree = 0
ntrees = seq(2:1000:10)
## Warning in 2:1000:10: numerical expression has 999 elements: only the first
## used
for (i in ntrees)
{
set.seed(97)
rf_model <- randomForest(x = TRAIN_conv_x, y = TRAIN_y, ntree = i)
rf_res <- predict(rf_model, VALIDATION_conv_x)
rf_res_round <- as.factor(round(as.numeric(rf_res)))
conf_res <- confusionMatrix(rf_res_round, VALIDATION_y)
if (conf_res$overall[1] > max_accur)
{
max_accur = conf_res$overall[1]
res_num_of_tree = num_of_tree
print(conf_res$overall[1])
}
}
## Accuracy
## 0.5588235
## Accuracy
## 0.6764706
## Accuracy
## 0.7647059
#BEST VALUES PRINTED
print(res_num_of_tree)
## [1] 2048
#TEST DATASET IS PREDICTED AND RESULTS ARE DISPLAYED
set.seed(97)
rf_res_model <- randomForest(x = TRAIN_conv_x, y = TRAIN_y, ntree = res_num_of_tree)
rf_res <- predict(rf_model, TEST_conv_x)
rf_res_round <- as.factor(round(as.numeric(rf_res)))
conf_res <- confusionMatrix(rf_res_round, TEST_y)
print(conf_res)
## Confusion Matrix and Statistics
##
## Reference
## Prediction 1 2
## 1 7 2
## 2 15 44
##
## Accuracy : 0.75
## 95% CI : (0.6302, 0.8471)
## No Information Rate : 0.6765
## P-Value [Acc > NIR] : 0.120363
##
## Kappa : 0.3248
## Mcnemar's Test P-Value : 0.003609
##
## Sensitivity : 0.3182
## Specificity : 0.9565
## Pos Pred Value : 0.7778
## Neg Pred Value : 0.7458
## Prevalence : 0.3235
## Detection Rate : 0.1029
## Detection Prevalence : 0.1324
## Balanced Accuracy : 0.6374
##
## 'Positive' Class : 1
##
The most important parameter for this model is the number of tress. This parameter is tried for different values and with the performance comparison, it is justified.
Notice that When the data is not considered as nominal for the necessary attributes and given to the model directly, the number of tree parameter is equal to 64. But when we preprocess the data to specify its type, this parameter becomes 2048. There is a trade-off situation where increasing ‘number of tree’ parameter gives better accuracy but wastes more space in the memory.
library(neuralnet)
TRAIN_conv_x <- ALL_DATA_x[1:239,]
VALIDATION_conv_x <- ALL_DATA_x[240:273,]
TEST_conv_x <- ALL_DATA_x[274:341,]
TRAIN_y <- TRAIN[,36]
TRAIN_y <- TRAIN_y - 1
VALIDATION_y <- VALIDATION[,36]
VALIDATION_y <- as.factor(VALIDATION_y-1)
TEST_y <- TEST[,36]
TEST_y <- as.factor(TEST_y - 1)
train_data <- data.frame(TRAIN_conv_x, TRAIN_y)
col_names = colnames(train_data)
for (i in col_names)
{
train_data[,i] <- as.numeric(train_data[,i])
}
col_names = colnames(TRAIN_conv_x)
formula_asd <- as.formula(paste("TRAIN_y ~ ", paste(col_names, collapse = "+")))
VALIDATION_conv_x_nn <- VALIDATION_conv_x
col_names = colnames(VALIDATION_conv_x_nn)
for (i in col_names)
{
VALIDATION_conv_x_nn[,i] <- as.numeric(VALIDATION_conv_x_nn[,i])
}
TEST_conv_x_nn <- TEST_conv_x
col_names = colnames(TEST_conv_x_nn)
for (i in col_names)
{
TEST_conv_x_nn[,i] <- as.numeric(TEST_conv_x_nn[,i])
}
nn_result_f <- function(x) {
ret_val = 0
if ( x >= 0.5 )
{
ret_val <- 1
}
else
{
ret_val <- 0
}
return (ret_val)
}
max_accur = 0
best_l1_num = 1
best_th = 1
for (i in 1:20)
{
for (j in 1:5)
{
nn_model <- neuralnet(formula_asd, data=train_data, linear.output = TRUE, hidden=c(i,1), threshold=0.01*j)
nn_res <- compute(nn_model, VALIDATION_conv_x_nn)$net.result
nn_res <- as.numeric(lapply(nn_res, nn_result_f))
nn_res <- as.factor(nn_res)
conf_res <- confusionMatrix(nn_res, VALIDATION_y)
if (max_accur < conf_res$overall[1])
{
max_accur = conf_res$overall[1]
best_l1_num = i
best_th = 0.01*j
print(conf_res$overall[1])
}
}
print(i)
}
## Accuracy
## 0.4705882353
## Accuracy
## 0.5294117647
## Accuracy
## 0.6764705882
## [1] 1
## Warning in confusionMatrix.default(nn_res, VALIDATION_y): Levels are not in
## the same order for reference and data. Refactoring data to match.
## [1] 2
## [1] 3
## Accuracy
## 0.7352941176
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## Accuracy
## 0.7647058824
## [1] 9
## Warning in confusionMatrix.default(nn_res, VALIDATION_y): Levels are not in
## the same order for reference and data. Refactoring data to match.
## [1] 10
## [1] 11
## Accuracy
## 0.7941176471
## [1] 12
## [1] 13
## [1] 14
## [1] 15
## [1] 16
## [1] 17
## Accuracy
## 0.8235294118
## [1] 18
## [1] 19
## [1] 20
print(best_l1_num)
## [1] 18
print(best_th)
## [1] 0.01
nn_model <- neuralnet(formula_asd, data=train_data, linear.output = TRUE, hidden=best_l1_num, threshold=best_th)
nn_res <- compute(nn_model, TEST_conv_x_nn)$net.result
nn_res <- as.numeric(lapply(nn_res, nn_result_f))
nn_res <- as.factor(nn_res)
conf_res <- confusionMatrix(nn_res, TEST_y)
print(conf_res)
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 12 7
## 1 10 39
##
## Accuracy : 0.75
## 95% CI : (0.6301776, 0.8471195)
## No Information Rate : 0.6764706
## P-Value [Acc > NIR] : 0.1203633
##
## Kappa : 0.4077869
## Mcnemar's Test P-Value : 0.6276258
##
## Sensitivity : 0.5454545
## Specificity : 0.8478261
## Pos Pred Value : 0.6315789
## Neg Pred Value : 0.7959184
## Prevalence : 0.3235294
## Detection Rate : 0.1764706
## Detection Prevalence : 0.2794118
## Balanced Accuracy : 0.6966403
##
## 'Positive' Class : 0
##
# K-means on training Data
X = ALL_DATA_x
# Using the elbow method to find optimal number of clusters
# Applying k-means to the dataset
set.seed(13)
kmeans = kmeans(X, 10, iter.max = 500)
# Visualizing library
# install.packages("cluster")
library(cluster)
clusplot(X,
kmeans$cluster,
lines = 0, # no line wanted
shade = TRUE, # shade depending on the denstiy
color = TRUE,
labels = 0,
plotchar = FALSE,
span = TRUE,
main = paste("Clusters of Data"),
xlab = "x-axis",
ylab = "y-axis")
Initial configuration is fixed. We will run k-means for k = 1:10. vi. Plot error vs k to find optimal number of clusters by using the elbow method.
set.seed(123)
wcss = vector() # an empty vector
for (i in 1:50) wcss[i] = sum(kmeans(X, i)$withinss)
plot(1:50, wcss, type = "b", main = paste("Clusters"), xlab = "# Clusters", ylab = "Within Cluster SS")
In this section, we also apply hiearchical clustering. In order to understand with linkages work best for the well seperated data, we plot their dendrogram in a for loop. As seen from the dendrograms, the best seperation is obtained when warD is used.
# 2.1. H-clust with different linkages
X = ALL_DATA_x_o
dend = list(list(),list(),list())
meth = c("ward.D", "single", "average")
names(dend) = meth
# Using dendrogram to find the opt num of clusters
for (i in 1:3) {
dend[i] = list(hclust(dist(X, method = "euclidean"), method = meth[i])) #dist.method: euc #agglom.method: ward
plot(dend[[i]],
main = paste("Dendrogram using", meth[i], sep = " " ), # title
xlab = "Points",
ylab = paste("Euclidean", "Distance", sep = " ")
)
}
# Fitting hierarchical clustering to the mall dataset with k = 4 (found using dendrogram)
numClus = 2
hc = hclust(dist(X, method = "euclidean"), method = "ward.D") # same function with different var.name
y_hc = cutree(hc, k = numClus) # cut tree where num.groups is 4
# Visualizing the clusters
# install.packages("cluster")
library(cluster)
clusplot(X,
y_hc,
lines = 0, # cluster merkezleri arasi ?izgi
shade = TRUE,
color = TRUE,
labels = 1, # 1: labellanacak noktalari secip goster 2: hepsini goster
plotchar = FALSE,
span = TRUE, # cluster icini tarama
main = paste("Clusters of Well Seperated Data using ward.D"),
xlab = "X1",
ylab = "X2")
clus_size = vector(length = numClus)
for (i in 1:length(y_hc)) clus_size[y_hc[i]] = clus_size[y_hc[i]]+1
show(clus_size)
## [1] 322 19
For H-clustering parameters, we first plot the dendogram of the clusters. On this dendogram, we see the separation distance (length) of the linkages. Then, we find the cluster numbers by cutting the tree at maximum length point.as Fitting hierarchical clustering to the mall dataset with k = 5 (found using dendrogram)
# Compute DBSCAN using fpc package
library("fpc")
set.seed(123)
df = ALL_DATA_x
db <- fpc::dbscan(df, eps = 2.6, MinPts = 3)
# Plot DBSCAN results
library("factoextra")
fviz_cluster(db, data = df, stand = FALSE,
ellipse = FALSE, show.clust.cent = FALSE,
geom = "point",palette = "jco", ggtheme = theme_classic())
When the parameters of all models have been set, the following accuracy results were achieved.
From these models, Random Forest and ANN are the models giving the best accuracy results. From these two, ANN is harder to implement whereas Random Forest is a much easier model than the ANN. The problem with Random Forest is that in some cases, the number of trees may get larger and this leads to memory issues.
SVM and KNN gives midlevel results. They are also easy to implemen and adjust parameters.
Naive Bayesian cannot handle this dataset. This is clearly a failure.
#models <- c("ANN ","Random Forest", "SVM", "KNN", "Naive Bayesian")
#accuracies <- c(0.82, 0.75, 0.72, 0.69, 0.48)
Now, we compare our clustering models using wcss analysis. wcss is a vector of within-cluster sum of squares, one component per cluster. To do this, we begin with an empy wcss vectors and we calculate and sum within ss values of clusters by running the model with 100 different initial configurations.. We can view the sum of within cluster sum of squares error and look at indices with minimum error.
wcss_k = vector() # an empty vector
for (i in 1:100) {
set.seed(i*20)
wcss[i] = sum(kmeans(X, 10)$tot.withinss)
}
plot(20*(1:100), wcss, type = "b", main = paste("Clusters"), xlab = "Initial Seed", ylab = "Within Cluster SS")
which(wcss == min(wcss)) # initial conditions with minimum error
## [1] 29
insens_init = length(which(wcss == min(wcss)))/100
insens_init
## [1] 0.01
In the above analysis, we created kmeans models with different k values (from k=2 to k=10) and initialize them from different initialization points by manipulating the random seed. Then, we sum wcss for each time and compare them against to find insensitivity to initialization point.
In our analysis, we have observed that increasing k-value significantly